[PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers
@ 2005-11-03 23:59 Linas Vepstas
  2005-11-04  0:42 ` [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces linas
                   ` (43 more replies)
  0 siblings, 44 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-03 23:59 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

What follows is a long sequence of mostly small patches to implement
PCI Error Recovery by adding notification callbacks to the PCI device
driver structure, implementing the recovery in 5 device drivers
(3 ethernet, two scsi drivers), and adding the actual error detection
and recovery code to the ppc64/powerpc arch tree.

Highlights:

-- Patches 1-14: Misc required ppc64/powerpc cleanup/bugfixes/restructuring
-- Patch 15: Overview documentation
-- Patch 16: changes to include/linux/pci.h
-- Patches 17-26: error detection and recovery for pSeries PCI bridge chips
-- Patchs 27-32: recovery patches for ethernet, scsi device drivers
-- Patches 33-42: More misc ppc64-specific changes

Signed-off-by: Linas Vepstas  <linas@austin.ibm.com>


^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
@ 2005-11-04  0:42 ` linas
  2005-11-04  0:47 ` [PATCH 2/42]: ppc64: misc minor cleanup Linas Vepstas
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-04  0:42 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel


01-pci-dn-uniformization.patch

This patch changes the rtas_pci interface to use the new struct pci_dn 
structure for two routines that work with pci device nodes.

This patch also does some minor janitorial work: it uses some handy macros 
and cleans up some trailing whitespace in the affected file.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 11:59:11.879644789 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:01:21.403477910 -0600
@@ -71,10 +71,6 @@
  *  and sent out for processing.
  */
 
-/** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */
-#define BUID_HI(buid) ((buid) >> 32)
-#define BUID_LO(buid) ((buid) & 0xffffffff)
-
 /* EEH event workqueue setup. */
 static DEFINE_SPINLOCK(eeh_eventlist_lock);
 LIST_HEAD(eeh_eventlist);
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-10-31 11:59:11.880644649 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-10-31 12:01:21.404477769 -0600
@@ -26,6 +26,10 @@
 
 extern struct pci_dev *ppc64_isabridge_dev;	/* may be NULL if no ISA bus */
 
+/** Bus Unit ID macros; get low and hi 32-bits of the 64-bit BUID */
+#define BUID_HI(buid) ((buid) >> 32)
+#define BUID_LO(buid) ((buid) & 0xffffffff)
+
 /* PCI device_node operations */
 struct device_node;
 typedef void *(*traverse_func)(struct device_node *me, void *data);
Index: linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/rtas_pci.c	2005-10-31 11:59:11.879644789 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c	2005-10-31 12:01:21.407477349 -0600
@@ -5,19 +5,19 @@
  * Copyright (C) 2003 Anton Blanchard <anton@au.ibm.com>, IBM
  *
  * RTAS specific routines for PCI.
- * 
+ *
  * Based on code from pci.c, chrp_pci.c and pSeries_pci.c
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- *    
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
@@ -47,7 +47,7 @@
 static int ibm_read_pci_config;
 static int ibm_write_pci_config;
 
-static int config_access_valid(struct pci_dn *dn, int where)
+static inline int config_access_valid(struct pci_dn *dn, int where)
 {
 	if (where < 256)
 		return 1;
@@ -72,16 +72,14 @@
         return 0;
 }
 
-static int rtas_read_config(struct device_node *dn, int where, int size, u32 *val)
+static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
 {
 	int returnval = -1;
 	unsigned long buid, addr;
 	int ret;
-	struct pci_dn *pdn;
 
-	if (!dn || !dn->data)
+	if (!pdn)
 		return PCIBIOS_DEVICE_NOT_FOUND;
-	pdn = dn->data;
 	if (!config_access_valid(pdn, where))
 		return PCIBIOS_BAD_REGISTER_NUMBER;
 
@@ -90,7 +88,7 @@
 	buid = pdn->phb->buid;
 	if (buid) {
 		ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval,
-				addr, buid >> 32, buid & 0xffffffff, size);
+				addr, BUID_HI(buid), BUID_LO(buid), size);
 	} else {
 		ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size);
 	}
@@ -100,7 +98,7 @@
 		return PCIBIOS_DEVICE_NOT_FOUND;
 
 	if (returnval == EEH_IO_ERROR_VALUE(size) &&
-	    eeh_dn_check_failure (dn, NULL))
+	    eeh_dn_check_failure (pdn->node, NULL))
 		return PCIBIOS_DEVICE_NOT_FOUND;
 
 	return PCIBIOS_SUCCESSFUL;
@@ -118,23 +116,23 @@
 		busdn = bus->sysdata;	/* must be a phb */
 
 	/* Search only direct children of the bus */
-	for (dn = busdn->child; dn; dn = dn->sibling)
-		if (dn->data && PCI_DN(dn)->devfn == devfn
+	for (dn = busdn->child; dn; dn = dn->sibling) {
+		struct pci_dn *pdn = PCI_DN(dn);
+		if (pdn && pdn->devfn == devfn
 		    && of_device_available(dn))
-			return rtas_read_config(dn, where, size, val);
+			return rtas_read_config(pdn, where, size, val);
+	}
 
 	return PCIBIOS_DEVICE_NOT_FOUND;
 }
 
-int rtas_write_config(struct device_node *dn, int where, int size, u32 val)
+int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val)
 {
 	unsigned long buid, addr;
 	int ret;
-	struct pci_dn *pdn;
 
-	if (!dn || !dn->data)
+	if (!pdn)
 		return PCIBIOS_DEVICE_NOT_FOUND;
-	pdn = dn->data;
 	if (!config_access_valid(pdn, where))
 		return PCIBIOS_BAD_REGISTER_NUMBER;
 
@@ -142,7 +140,8 @@
 		(pdn->devfn << 8) | (where & 0xff);
 	buid = pdn->phb->buid;
 	if (buid) {
-		ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val);
+		ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr,
+			BUID_HI(buid), BUID_LO(buid), size, (ulong) val);
 	} else {
 		ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val);
 	}
@@ -165,10 +164,12 @@
 		busdn = bus->sysdata;	/* must be a phb */
 
 	/* Search only direct children of the bus */
-	for (dn = busdn->child; dn; dn = dn->sibling)
-		if (dn->data && PCI_DN(dn)->devfn == devfn
+	for (dn = busdn->child; dn; dn = dn->sibling) {
+		struct pci_dn *pdn = PCI_DN(dn);
+		if (pdn && pdn->devfn == devfn
 		    && of_device_available(dn))
-			return rtas_write_config(dn, where, size, val);
+			return rtas_write_config(pdn, where, size, val);
+	}
 	return PCIBIOS_DEVICE_NOT_FOUND;
 }
 
@@ -221,7 +222,7 @@
 	/* Python's register file is 1 MB in size. */
 	chip_regs = ioremap(reg_struct.address & ~(0xfffffUL), 0x100000);
 
-	/* 
+	/*
 	 * Firmware doesn't always clear this bit which is critical
 	 * for good performance - Anton
 	 */
@@ -292,7 +293,7 @@
 	if (bus_range == NULL || len < 2 * sizeof(int)) {
 		return 1;
  	}
- 
+
 	phb->first_busno =  bus_range[0];
 	phb->last_busno  =  bus_range[1];
 

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 2/42]: ppc64: misc minor cleanup
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
  2005-11-04  0:42 ` [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces linas
@ 2005-11-04  0:47 ` Linas Vepstas
  2005-11-04  0:48 ` [PATCH 3/42]: ppc64: PCI address cache minor fixes Linas Vepstas
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:47 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

02-eeh-minor-cleanup.patch

This patch performs some minor cleanup of the eeh.c file, including:
-- trim some trailing whitespace
-- remove extraneous #includes
-- use the macro PCI_DN uniformly, instead of the void pointer chase.
-- typos in comments
-- improved debug printk's

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:01:21.403477910 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:06:16.222121166 -0600
@@ -1,32 +1,31 @@
 /*
  * eeh.c
  * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
-#include <linux/bootmem.h>
 #include <linux/init.h>
 #include <linux/list.h>
-#include <linux/mm.h>
 #include <linux/notifier.h>
 #include <linux/pci.h>
 #include <linux/proc_fs.h>
 #include <linux/rbtree.h>
 #include <linux/seq_file.h>
 #include <linux/spinlock.h>
+#include <asm/atomic.h>
 #include <asm/eeh.h>
 #include <asm/io.h>
 #include <asm/machdep.h>
@@ -49,8 +48,8 @@
  *  were "empty": all reads return 0xff's and all writes are silently
  *  ignored.  EEH slot isolation events can be triggered by parity
  *  errors on the address or data busses (e.g. during posted writes),
- *  which in turn might be caused by dust, vibration, humidity,
- *  radioactivity or plain-old failed hardware.
+ *  which in turn might be caused by low voltage on the bus, dust,
+ *  vibration, humidity, radioactivity or plain-old failed hardware.
  *
  *  Note, however, that one of the leading causes of EEH slot
  *  freeze events are buggy device drivers, buggy device microcode,
@@ -256,18 +255,17 @@
 
 	dn = pci_device_to_OF_node(dev);
 	if (!dn) {
-		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n",
-			pci_name(dev));
+		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
 		return;
 	}
 
 	/* Skip any devices for which EEH is not enabled. */
-	pdn = dn->data;
+	pdn = PCI_DN(dn);
 	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
 	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
 #ifdef DEBUG
-		printk(KERN_INFO "PCI: skip building address cache for=%s\n",
-		       pci_name(dev));
+		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
+		       pci_name(dev), pdn->node->full_name);
 #endif
 		return;
 	}
@@ -410,16 +408,16 @@
  * @dn: device node to read
  * @rets: array to return results in
  */
-static int read_slot_reset_state(struct device_node *dn, int rets[])
+static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
 {
 	int token, outputs;
-	struct pci_dn *pdn = dn->data;
 
 	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
 		token = ibm_read_slot_reset_state2;
 		outputs = 4;
 	} else {
 		token = ibm_read_slot_reset_state;
+		rets[2] = 0; /* fake PE Unavailable info */
 		outputs = 3;
 	}
 
@@ -496,7 +494,7 @@
 
 /**
  * eeh_token_to_phys - convert EEH address token to phys address
- * @token i/o token, should be address in the form 0xE....
+ * @token i/o token, should be address in the form 0xA....
  */
 static inline unsigned long eeh_token_to_phys(unsigned long token)
 {
@@ -522,7 +520,7 @@
  * will query firmware for the EEH status.
  *
  * Returns 0 if there has not been an EEH error; otherwise returns
- * a non-zero value and queues up a solt isolation event notification.
+ * a non-zero value and queues up a slot isolation event notification.
  *
  * It is safe to call this routine in an interrupt context.
  */
@@ -542,7 +540,7 @@
 
 	if (!dn)
 		return 0;
-	pdn = dn->data;
+	pdn = PCI_DN(dn);
 
 	/* Access to IO BARs might get this far and still not want checking. */
 	if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
@@ -562,7 +560,7 @@
 		atomic_inc(&eeh_fail_count);
 		if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
 			/* re-read the slot reset state */
-			if (read_slot_reset_state(dn, rets) != 0)
+			if (read_slot_reset_state(pdn, rets) != 0)
 				rets[0] = -1;	/* reset state unknown */
 			eeh_panic(dev, rets[0]);
 		}
@@ -576,7 +574,7 @@
 	 * function zero of a multi-function device.
 	 * In any case they must share a common PHB.
 	 */
-	ret = read_slot_reset_state(dn, rets);
+	ret = read_slot_reset_state(pdn, rets);
 	if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) {
 		__get_cpu_var(false_positives)++;
 		return 0;
@@ -635,7 +633,6 @@
  * @token i/o token, should be address in the form 0xA....
  * @val value, should be all 1's (XXX why do we need this arg??)
  *
- * Check for an eeh failure at the given token address.
  * Check for an EEH failure at the given token address.  Call this
  * routine if the result of a read was all 0xff's and you want to
  * find out if this is due to an EEH slot freeze event.  This routine
@@ -680,7 +677,7 @@
 	u32 *device_id = (u32 *)get_property(dn, "device-id", NULL);
 	u32 *regs;
 	int enable;
-	struct pci_dn *pdn = dn->data;
+	struct pci_dn *pdn = PCI_DN(dn);
 
 	pdn->eeh_mode = 0;
 
@@ -732,7 +729,7 @@
 
 			/* This device doesn't support EEH, but it may have an
 			 * EEH parent, in which case we mark it as supported. */
-			if (dn->parent && dn->parent->data
+			if (dn->parent && PCI_DN(dn->parent)
 			    && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
 				/* Parent supports EEH. */
 				pdn->eeh_mode |= EEH_MODE_SUPPORTED;
@@ -745,7 +742,7 @@
 		       dn->full_name);
 	}
 
-	return NULL; 
+	return NULL;
 }
 
 /*
@@ -793,13 +790,11 @@
 	for (phb = of_find_node_by_name(NULL, "pci"); phb;
 	     phb = of_find_node_by_name(phb, "pci")) {
 		unsigned long buid;
-		struct pci_dn *pci;
 
 		buid = get_phb_buid(phb);
-		if (buid == 0 || phb->data == NULL)
+		if (buid == 0 || PCI_DN(phb) == NULL)
 			continue;
 
-		pci = phb->data;
 		info.buid_lo = BUID_LO(buid);
 		info.buid_hi = BUID_HI(buid);
 		traverse_pci_devices(phb, early_enable_eeh, &info);
@@ -828,11 +823,13 @@
 	struct pci_controller *phb;
 	struct eeh_early_enable_info info;
 
-	if (!dn || !dn->data)
+	if (!dn || !PCI_DN(dn))
 		return;
 	phb = PCI_DN(dn)->phb;
 	if (NULL == phb || 0 == phb->buid) {
-		printk(KERN_WARNING "EEH: Expected buid but found none\n");
+		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
+		       dn->full_name);
+		dump_stack();
 		return;
 	}
 

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 3/42]: ppc64: PCI address cache minor fixes
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
  2005-11-04  0:42 ` [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces linas
  2005-11-04  0:47 ` [PATCH 2/42]: ppc64: misc minor cleanup Linas Vepstas
@ 2005-11-04  0:48 ` Linas Vepstas
  2005-11-04  0:48 ` [PATCH 4/42]: ppc64: PCI error rate statistics Linas Vepstas
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:48 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

03-eeh-addr-cache-cleanup.patch

This is a minor patch to clean up a buglet related to the PCI address cache.
(The buglet doesn't manifes itself unless there are also bugs elsewhere,
which is why its minor.).  Also:

-- Improved debug printing.
-- Declare some private routines as static
-- Adds reference counting to struct pci_dn->pcidev structure

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:07:15.072864803 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:10:23.985360685 -0600
@@ -219,9 +219,9 @@
 	while (*p) {
 		parent = *p;
 		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
-		if (alo < piar->addr_lo) {
+		if (ahi < piar->addr_lo) {
 			p = &parent->rb_left;
-		} else if (ahi > piar->addr_hi) {
+		} else if (alo > piar->addr_hi) {
 			p = &parent->rb_right;
 		} else {
 			if (dev != piar->pcidev ||
@@ -240,6 +240,11 @@
 	piar->pcidev = dev;
 	piar->flags = flags;
 
+#ifdef DEBUG
+	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
+	                  alo, ahi, pci_name (dev));
+#endif
+
 	rb_link_node(&piar->rb_node, parent, p);
 	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
 
@@ -301,7 +306,7 @@
  * we maintain a cache of devices that can be quickly searched.
  * This routine adds a device to that cache.
  */
-void pci_addr_cache_insert_device(struct pci_dev *dev)
+static void pci_addr_cache_insert_device(struct pci_dev *dev)
 {
 	unsigned long flags;
 
@@ -344,7 +349,7 @@
  * the tree multiple times (once per resource).
  * But so what; device removal doesn't need to be that fast.
  */
-void pci_addr_cache_remove_device(struct pci_dev *dev)
+static void pci_addr_cache_remove_device(struct pci_dev *dev)
 {
 	unsigned long flags;
 
@@ -366,6 +371,9 @@
 {
 	struct pci_dev *dev = NULL;
 
+	if (!eeh_subsystem_enabled)
+		return;
+
 	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
 
 	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
@@ -837,7 +845,7 @@
 	info.buid_lo = BUID_LO(phb->buid);
 	early_enable_eeh(dn, &info);
 }
-EXPORT_SYMBOL(eeh_add_device_early);
+EXPORT_SYMBOL_GPL(eeh_add_device_early);
 
 /**
  * eeh_add_device_late - perform EEH initialization for the indicated pci device
@@ -848,6 +856,8 @@
  */
 void eeh_add_device_late(struct pci_dev *dev)
 {
+	struct device_node *dn;
+
 	if (!dev || !eeh_subsystem_enabled)
 		return;
 
@@ -855,9 +865,13 @@
 	printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev));
 #endif
 
+	pci_dev_get (dev);
+	dn = pci_device_to_OF_node(dev);
+	PCI_DN(dn)->pcidev = dev;
+
 	pci_addr_cache_insert_device (dev);
 }
-EXPORT_SYMBOL(eeh_add_device_late);
+EXPORT_SYMBOL_GPL(eeh_add_device_late);
 
 /**
  * eeh_remove_device - undo EEH setup for the indicated pci device
@@ -868,6 +882,7 @@
  */
 void eeh_remove_device(struct pci_dev *dev)
 {
+	struct device_node *dn;
 	if (!dev || !eeh_subsystem_enabled)
 		return;
 
@@ -876,8 +891,12 @@
 	printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev));
 #endif
 	pci_addr_cache_remove_device(dev);
+
+	dn = pci_device_to_OF_node(dev);
+	PCI_DN(dn)->pcidev = NULL;
+	pci_dev_put (dev);
 }
-EXPORT_SYMBOL(eeh_remove_device);
+EXPORT_SYMBOL_GPL(eeh_remove_device);
 
 static int proc_eeh_show(struct seq_file *m, void *v)
 {
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-10-31 12:01:21.404477769 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-10-31 12:10:06.152862619 -0600
@@ -39,10 +39,6 @@
 void pci_devs_phb_init(void);
 void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
-/* PCI address cache management routines */
-void pci_addr_cache_insert_device(struct pci_dev *dev);
-void pci_addr_cache_remove_device(struct pci_dev *dev);
-
 /* From rtas_pci.h */
 void init_pci_config_tokens (void);
 unsigned long get_phb_buid (struct device_node *);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 4/42]: ppc64: PCI error rate statistics
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (2 preceding siblings ...)
  2005-11-04  0:48 ` [PATCH 3/42]: ppc64: PCI address cache minor fixes Linas Vepstas
@ 2005-11-04  0:48 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 5/42]: ppc64: RTAS error reporting restructuring Linas Vepstas
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:48 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

04-eeh-statistics.patch

This minor patch adds some statistics-gathering counters that allow the 
behaviour of the EEH subsystem o be monitored. While far from perfect,
it does provide a rudimentary device that makes understanding of the 
current state of the system a bit easier.

Signed-off-by: Linas Vepstas <linas@linas.org>


Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:10:23.985360685 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:11:57.134291514 -0600
@@ -102,6 +102,10 @@
 static int eeh_error_buf_size;
 
 /* System monitoring statistics */
+static DEFINE_PER_CPU(unsigned long, no_device);
+static DEFINE_PER_CPU(unsigned long, no_dn);
+static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
+static DEFINE_PER_CPU(unsigned long, ignored_check);
 static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
 static DEFINE_PER_CPU(unsigned long, false_positives);
 static DEFINE_PER_CPU(unsigned long, ignored_failures);
@@ -493,8 +497,6 @@
 		notifier_call_chain (&eeh_notifier_chain,
 				     EEH_NOTIFY_FREEZE, event);
 
-		__get_cpu_var(slot_resets)++;
-
 		pci_dev_put(event->dev);
 		kfree(event);
 	}
@@ -546,17 +548,24 @@
 	if (!eeh_subsystem_enabled)
 		return 0;
 
-	if (!dn)
+	if (!dn) {
+		__get_cpu_var(no_dn)++;
 		return 0;
+	}
 	pdn = PCI_DN(dn);
 
 	/* Access to IO BARs might get this far and still not want checking. */
 	if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
 	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
+		__get_cpu_var(ignored_check)++;
+#ifdef DEBUG
+		printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name);
+#endif
 		return 0;
 	}
 
 	if (!pdn->eeh_config_addr) {
+		__get_cpu_var(no_cfg_addr)++;
 		return 0;
 	}
 
@@ -590,6 +599,7 @@
 
 	/* prevent repeated reports of this failure */
 	pdn->eeh_mode |= EEH_MODE_ISOLATED;
+	 __get_cpu_var(slot_resets)++;
 
 	reset_state = rets[0];
 
@@ -657,8 +667,10 @@
 	/* Finding the phys addr + pci device; this is pretty quick. */
 	addr = eeh_token_to_phys((unsigned long __force) token);
 	dev = pci_get_device_by_addr(addr);
-	if (!dev)
+	if (!dev) {
+		__get_cpu_var(no_device)++;
 		return val;
+	}
 
 	dn = pci_device_to_OF_node(dev);
 	eeh_dn_check_failure (dn, dev);
@@ -903,12 +915,17 @@
 	unsigned int cpu;
 	unsigned long ffs = 0, positives = 0, failures = 0;
 	unsigned long resets = 0;
+	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
 
 	for_each_cpu(cpu) {
 		ffs += per_cpu(total_mmio_ffs, cpu);
 		positives += per_cpu(false_positives, cpu);
 		failures += per_cpu(ignored_failures, cpu);
 		resets += per_cpu(slot_resets, cpu);
+		no_dev += per_cpu(no_device, cpu);
+		no_dn += per_cpu(no_dn, cpu);
+		no_cfg += per_cpu(no_cfg_addr, cpu);
+		no_check += per_cpu(ignored_check, cpu);
 	}
 
 	if (0 == eeh_subsystem_enabled) {
@@ -916,13 +933,17 @@
 		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
 	} else {
 		seq_printf(m, "EEH Subsystem is enabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%ld\n"
-			   "eeh_false_positives=%ld\n"
-			   "eeh_ignored_failures=%ld\n"
-			   "eeh_slot_resets=%ld\n"
-				"eeh_fail_count=%d\n",
-			   ffs, positives, failures, resets,
-				eeh_fail_count.counter);
+		seq_printf(m,
+				"no device=%ld\n"
+				"no device node=%ld\n"
+				"no config address=%ld\n"
+				"check not wanted=%ld\n"
+				"eeh_total_mmio_ffs=%ld\n"
+				"eeh_false_positives=%ld\n"
+				"eeh_ignored_failures=%ld\n"
+				"eeh_slot_resets=%ld\n",
+				no_dev, no_dn, no_cfg, no_check,
+				ffs, positives, failures, resets);
 	}
 
 	return 0;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 5/42]: ppc64: RTAS error reporting restructuring
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (3 preceding siblings ...)
  2005-11-04  0:48 ` [PATCH 4/42]: ppc64: PCI error rate statistics Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 6/42]: ppc64: avoid PCI error reporting for empty slots Linas Vepstas
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

05-eeh-slot-error-detail.patch

This patch encapsulates a section of code that reports the EEH event.
The new subroutine can be used in several places to report the error.

Signed-off-by: Linas Vepstas <linas@linas.org>


Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:11:57.134291514 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:13:09.282168648 -0600
@@ -397,6 +397,28 @@
 /* --------------------------------------------------------------- */
 /* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
 
+void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
+{
+	unsigned long flags;
+	int rc;
+
+	/* Log the error with the rtas logger */
+	spin_lock_irqsave(&slot_errbuf_lock, flags);
+	memset(slot_errbuf, 0, eeh_error_buf_size);
+
+	rc = rtas_call(ibm_slot_error_detail,
+	               8, 1, NULL, pdn->eeh_config_addr,
+	               BUID_HI(pdn->phb->buid),
+	               BUID_LO(pdn->phb->buid), NULL, 0,
+	               virt_to_phys(slot_errbuf),
+	               eeh_error_buf_size,
+	               severity);
+
+	if (rc == 0)
+		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
+	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+}
+
 /**
  * eeh_register_notifier - Register to find out about EEH events.
  * @nb: notifier block to callback on events
@@ -454,9 +476,12 @@
 	 * Since the panic_on_oops sysctl is used to halt the system
 	 * in light of potential corruption, we can use it here.
 	 */
-	if (panic_on_oops)
+	if (panic_on_oops) {
+		struct device_node *dn = pci_device_to_OF_node(dev);
+		eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
 		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
 		      pci_name(dev));
+	}
 	else {
 		__get_cpu_var(ignored_failures)++;
 		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
@@ -539,7 +564,7 @@
 	int ret;
 	int rets[3];
 	unsigned long flags;
-	int rc, reset_state;
+	int reset_state;
 	struct eeh_event  *event;
 	struct pci_dn *pdn;
 
@@ -603,20 +628,7 @@
 
 	reset_state = rets[0];
 
-	spin_lock_irqsave(&slot_errbuf_lock, flags);
-	memset(slot_errbuf, 0, eeh_error_buf_size);
-
-	rc = rtas_call(ibm_slot_error_detail,
-	               8, 1, NULL, pdn->eeh_config_addr,
-	               BUID_HI(pdn->phb->buid),
-	               BUID_LO(pdn->phb->buid), NULL, 0,
-	               virt_to_phys(slot_errbuf),
-	               eeh_error_buf_size,
-	               1 /* Temporary Error */);
-
-	if (rc == 0)
-		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
-	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+	eeh_slot_error_detail (pdn, 1 /* Temporary Error */);
 
 	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
 	       rets[0], dn->name, dn->full_name);
@@ -783,6 +795,8 @@
 	struct device_node *phb, *np;
 	struct eeh_early_enable_info info;
 
+	spin_lock_init(&slot_errbuf_lock);
+
 	np = of_find_node_by_path("/rtas");
 	if (np == NULL)
 		return;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 6/42]: ppc64: avoid PCI error reporting for empty slots
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (4 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 5/42]: ppc64: RTAS error reporting restructuring Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 7/42]: ppc64: serialize reports of PCI errors Linas Vepstas
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

06-eeh-empty-slot-error.patch

Performing PCI config-space reads to empty PCI slots can lead to reports of 
"permanent failure" from the firmware. Ignore permanent failures on empty slots.

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:13:09.282168648 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:15:26.162962756 -0600
@@ -617,7 +617,32 @@
 	 * In any case they must share a common PHB.
 	 */
 	ret = read_slot_reset_state(pdn, rets);
-	if (!(ret == 0 && rets[1] == 1 && (rets[0] == 2 || rets[0] == 4))) {
+
+	/* If the call to firmware failed, punt */
+	if (ret != 0) {
+		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
+		       ret, dn->full_name);
+		__get_cpu_var(false_positives)++;
+		return 0;
+	}
+
+	/* If EEH is not supported on this device, punt. */
+	if (rets[1] != 1) {
+		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
+		       ret, dn->full_name);
+		__get_cpu_var(false_positives)++;
+		return 0;
+	}
+
+	/* If not the kind of error we know about, punt. */
+	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
+		__get_cpu_var(false_positives)++;
+		return 0;
+	}
+
+	/* Note that config-io to empty slots may fail;
+	 * we recognize empty because they don't have children. */
+	if ((rets[0] == 5) && (dn->child == NULL)) {
 		__get_cpu_var(false_positives)++;
 		return 0;
 	}
@@ -650,7 +675,7 @@
 	/* Most EEH events are due to device driver bugs.  Having
 	 * a stack trace will help the device-driver authors figure
 	 * out what happened.  So print that out. */
-	dump_stack();
+	if (rets[0] != 5) dump_stack();
 	schedule_work(&eeh_event_wq);
 
 	return 0;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 7/42]: ppc64: serialize reports of PCI errors
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (5 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 6/42]: ppc64: avoid PCI error reporting for empty slots Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 8/42]: ppc64: escape hatch for spinning interrupt deadlocks Linas Vepstas
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

07-eeh-report-race.patch

When a PCI slot is isolated, all PCI functions under that slot are affected.
If hese functions have separate device drivers, the EEH isolation event
might be reported multiple times. This patch adds a lock to prevent the 
racing of such multiple reports. It also marks every device under the slot
as having experienced an EEH event, so that multiple reports may be 
recognized more easily.

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:15:26.162962756 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:16:19.766441392 -0600
@@ -96,6 +96,9 @@
 
 static int eeh_subsystem_enabled;
 
+/* Lock to avoid races due to multiple reports of an error */
+static DEFINE_SPINLOCK(confirm_error_lock);
+
 /* Buffer for reporting slot-error-detail rtas calls */
 static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
 static DEFINE_SPINLOCK(slot_errbuf_lock);
@@ -544,6 +547,55 @@
 	return pa | (token & (PAGE_SIZE-1));
 }
 
+/** 
+ * Return the "partitionable endpoint" (pe) under which this device lies
+ */
+static struct device_node * find_device_pe(struct device_node *dn)
+{
+	while ((dn->parent) && PCI_DN(dn->parent) &&
+	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+		dn = dn->parent;
+	}
+	return dn;
+}
+
+/** Mark all devices that are peers of this device as failed.
+ *  Mark the device driver too, so that it can see the failure
+ *  immediately; this is critical, since some drivers poll
+ *  status registers in interrupts ... If a driver is polling,
+ *  and the slot is frozen, then the driver can deadlock in
+ *  an interrupt context, which is bad.
+ */
+
+static inline void __eeh_mark_slot (struct device_node *dn)
+{
+	while (dn) {
+		PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
+
+		if (dn->child)
+			__eeh_mark_slot (dn->child);
+		dn = dn->sibling;
+	}
+}
+
+static inline void __eeh_clear_slot (struct device_node *dn)
+{
+	while (dn) {
+		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
+		if (dn->child)
+			__eeh_clear_slot (dn->child);
+		dn = dn->sibling;
+	}
+}
+
+static inline void eeh_clear_slot (struct device_node *dn)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&confirm_error_lock, flags);
+	__eeh_clear_slot (dn);
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+}
+
 /**
  * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
  * @dn device node
@@ -567,6 +619,8 @@
 	int reset_state;
 	struct eeh_event  *event;
 	struct pci_dn *pdn;
+	struct device_node *pe_dn;
+	int rc = 0;
 
 	__get_cpu_var(total_mmio_ffs)++;
 
@@ -594,10 +648,14 @@
 		return 0;
 	}
 
-	/*
-	 * If we already have a pending isolation event for this
-	 * slot, we know it's bad already, we don't need to check...
+	/* If we already have a pending isolation event for this
+	 * slot, we know it's bad already, we don't need to check.
+	 * Do this checking under a lock; as multiple PCI devices
+	 * in one slot might report errors simultaneously, and we
+	 * only want one error recovery routine running.
 	 */
+	spin_lock_irqsave(&confirm_error_lock, flags);
+	rc = 1;
 	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
 		atomic_inc(&eeh_fail_count);
 		if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
@@ -606,7 +664,7 @@
 				rets[0] = -1;	/* reset state unknown */
 			eeh_panic(dev, rets[0]);
 		}
-		return 0;
+		goto dn_unlock;
 	}
 
 	/*
@@ -623,7 +681,8 @@
 		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
 		       ret, dn->full_name);
 		__get_cpu_var(false_positives)++;
-		return 0;
+		rc = 0;
+		goto dn_unlock;
 	}
 
 	/* If EEH is not supported on this device, punt. */
@@ -631,25 +690,33 @@
 		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
 		       ret, dn->full_name);
 		__get_cpu_var(false_positives)++;
-		return 0;
+		rc = 0;
+		goto dn_unlock;
 	}
 
 	/* If not the kind of error we know about, punt. */
 	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
 		__get_cpu_var(false_positives)++;
-		return 0;
+		rc = 0;
+		goto dn_unlock;
 	}
 
 	/* Note that config-io to empty slots may fail;
 	 * we recognize empty because they don't have children. */
 	if ((rets[0] == 5) && (dn->child == NULL)) {
 		__get_cpu_var(false_positives)++;
-		return 0;
+		rc = 0;
+		goto dn_unlock;
 	}
 
-	/* prevent repeated reports of this failure */
-	pdn->eeh_mode |= EEH_MODE_ISOLATED;
-	 __get_cpu_var(slot_resets)++;
+	__get_cpu_var(slot_resets)++;
+ 
+	/* Avoid repeated reports of this failure, including problems
+	 * with other functions on this device, and functions under
+	 * bridges. */
+	pe_dn = find_device_pe (dn);
+	__eeh_mark_slot (pe_dn);
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
 
 	reset_state = rets[0];
 
@@ -678,10 +745,14 @@
 	if (rets[0] != 5) dump_stack();
 	schedule_work(&eeh_event_wq);
 
-	return 0;
+	return 1;
+
+dn_unlock:
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+	return rc;
 }
 
-EXPORT_SYMBOL(eeh_dn_check_failure);
+EXPORT_SYMBOL_GPL(eeh_dn_check_failure);
 
 /**
  * eeh_check_failure - check if all 1's data is due to EEH slot freeze
@@ -820,6 +891,7 @@
 	struct device_node *phb, *np;
 	struct eeh_early_enable_info info;
 
+	spin_lock_init(&confirm_error_lock);
 	spin_lock_init(&slot_errbuf_lock);
 
 	np = of_find_node_by_path("/rtas");

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 8/42]: ppc64: escape hatch for spinning interrupt deadlocks
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (6 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 7/42]: ppc64: serialize reports of PCI errors Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 9/42]: ppc64: bugfix: crash on PCI hotplug Linas Vepstas
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

08-eeh-spin-counter.patch

One an EEH event is triggers, all further I/O to a device is blocked (until
reset).  Bad device drivers may end up spinning in their interrupt handlers, 
trying to read an interrupt status register that will never change state.
This patch moves that spin counter to a per-device structure, and adds
some diagnostic prints to help locate the bad driver.

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:16:19.766441392 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:18:21.924300428 -0600
@@ -78,14 +78,12 @@
 
 static struct notifier_block *eeh_notifier_chain;
 
-/*
- * If a device driver keeps reading an MMIO register in an interrupt
+/* If a device driver keeps reading an MMIO register in an interrupt
  * handler after a slot isolation event has occurred, we assume it
  * is broken and panic.  This sets the threshold for how many read
  * attempts we allow before panicking.
  */
-#define EEH_MAX_FAILS	1000
-static atomic_t eeh_fail_count;
+#define EEH_MAX_FAILS	100000
 
 /* RTAS tokens */
 static int ibm_set_eeh_option;
@@ -521,7 +519,6 @@
 		       "%s\n", event->reset_state,
 		       pci_name(event->dev));
 
-		atomic_set(&eeh_fail_count, 0);
 		notifier_call_chain (&eeh_notifier_chain,
 				     EEH_NOTIFY_FREEZE, event);
 
@@ -657,12 +654,18 @@
 	spin_lock_irqsave(&confirm_error_lock, flags);
 	rc = 1;
 	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
-		atomic_inc(&eeh_fail_count);
-		if (atomic_read(&eeh_fail_count) >= EEH_MAX_FAILS) {
+		pdn->eeh_check_count ++;
+		if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
+			printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
+			        pdn->eeh_check_count);
+			dump_stack();
+			
 			/* re-read the slot reset state */
 			if (read_slot_reset_state(pdn, rets) != 0)
 				rets[0] = -1;	/* reset state unknown */
-			eeh_panic(dev, rets[0]);
+
+			/* If we are here, then we hit an infinite loop. Stop. */
+			panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
 		}
 		goto dn_unlock;
 	}
@@ -808,6 +811,8 @@
 	struct pci_dn *pdn = PCI_DN(dn);
 
 	pdn->eeh_mode = 0;
+	pdn->eeh_check_count = 0;
+	pdn->eeh_freeze_count = 0;
 
 	if (status && strcmp(status, "ok") != 0)
 		return NULL;	/* ignore devices with bad status */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 9/42]: ppc64: bugfix: crash on PCI hotplug
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (7 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 8/42]: ppc64: escape hatch for spinning interrupt deadlocks Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 10/42]: ppc64: bugfix: don't silently gnore PCI errors Linas Vepstas
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

09-hotplug-bugfix.patch

In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add 
of a hotplug slot will crash the system, with the following (abbreviated) 
stack trace:

cpu 0x3: Vector: 700 (Program Check) at [c000000053dff7f0]
    pc: c0000000004f5974: .__alloc_bootmem+0x0/0xb0
    lr: c0000000000258a0: .update_dn_pci_info+0x108/0x118
        c0000000000257c8 .update_dn_pci_info+0x30/0x118 (unreliable)
        c0000000000258fc .pci_dn_reconfig_notifier+0x4c/0x64
        c000000000060754 .notifier_call_chain+0x68/0x9c

The root cause was that __init __alloc_bootmem() was called long after 
boot had finished, resulting in a crash because this routine is undefined
after boot time.  The patch below fixes this crash, and adds some docs to 
clarify the code.

p.s. congrats to all for getting slashdotted on this yesterday!

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Mailed to: paulus@samba.org
CC: linuxppc64-dev@ozlabs.org, linux-kernel@vger.kernel.org, johnrose@linux.ibm.com
On Monday 3 October 2005

revised on 4 Ocober to
[PATCH 1/2] ppc64: Crash in DLPAR code on PCI hotplug add

Index: linux-2.6.14-git3/arch/ppc64/kernel/pci_dn.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/pci_dn.c	2005-10-31 12:19:03.211506966 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/pci_dn.c	2005-10-31 12:19:47.420303479 -0600
@@ -43,7 +43,7 @@
 	u32 *regs;
 	struct pci_dn *pdn;
 
-	if (phb->is_dynamic)
+	if (mem_init_done)
 		pdn = kmalloc(sizeof(*pdn), GFP_KERNEL);
 	else
 		pdn = alloc_bootmem(sizeof(*pdn));
@@ -120,6 +120,14 @@
 	return NULL;
 }
 
+/** 
+ * pci_devs_phb_init_dynamic - setup pci devices under this PHB
+ * phb: pci-to-host bridge (top-level bridge connecting to cpu)
+ *
+ * This routine is called both during boot, (before the memory
+ * subsystem is set up, before kmalloc is valid) and during the 
+ * dynamic lpar operation of adding a PHB to a running system.
+ */
 void __devinit pci_devs_phb_init_dynamic(struct pci_controller *phb)
 {
 	struct device_node * dn = (struct device_node *) phb->arch_data;
@@ -200,9 +208,14 @@
 	.notifier_call = pci_dn_reconfig_notifier,
 };
 
-/*
- * Actually initialize the phbs.
- * The buswalk on this phb has not happened yet.
+/** 
+ * pci_devs_phb_init - Initialize phbs and pci devs under them.
+ * 
+ * This routine walks over all phb's (pci-host bridges) on the
+ * system, and sets up assorted pci-related structures 
+ * (including pci info in the device node structs) for each
+ * pci device found underneath.  This routine runs once,
+ * early in the boot sequence.
  */
 void __init pci_devs_phb_init(void)
 {

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 10/42]: ppc64: bugfix: don't silently gnore PCI errors
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (8 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 9/42]: ppc64: bugfix: crash on PCI hotplug Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:49 ` [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 Linas Vepstas
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

10-EEH-enable-bugfix.patch

Bugfix: With the curent linux-2.6.14-rc2-git6, EEH errors are 
ignored because thier detection requires an unusued, uninitialized 
flag to be set.  This patch removes the unused flag.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-10-31 12:54:20.919034814 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/eeh.c	2005-10-31 12:54:48.165215962 -0600
@@ -631,11 +631,12 @@
 	pdn = PCI_DN(dn);
 
 	/* Access to IO BARs might get this far and still not want checking. */
-	if (!pdn->eeh_capable || !(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
+	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
 	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
 		__get_cpu_var(ignored_check)++;
 #ifdef DEBUG
-		printk ("EEH:ignored check for %s %s\n", pci_name (dev), dn->full_name);
+		printk ("EEH:ignored check (%x) for %s %s\n", 
+		        pdn->eeh_mode, pci_name (dev), dn->full_name);
 #endif
 		return 0;
 	}
Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h	2005-10-31 12:54:20.919034814 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h	2005-10-31 12:54:48.167215682 -0600
@@ -63,7 +63,6 @@
 	int	devfn;			/* for pci devices */
 	int	eeh_mode;		/* See eeh.h for possible EEH_MODEs */
 	int	eeh_config_addr;
-	int	eeh_capable;		/* from firmware */
 	int 	eeh_check_count;	/* # times driver ignored error */
 	int 	eeh_freeze_count;	/* # times this device froze up. */
 	int	eeh_is_bridge;		/* device is pci-to-pci bridge */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (9 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 10/42]: ppc64: bugfix: don't silently gnore PCI errors Linas Vepstas
@ 2005-11-04  0:49 ` Linas Vepstas
  2005-11-04  0:50 ` [PATCH 12/42]: ppc64: PCI error event dispatcher Linas Vepstas
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:49 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

11-eeh-move-to-powerpc.patch

Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c
No other changes (except for Makefile to build it)

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-11-02 14:29:22.485829789 -0600
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,1093 +0,0 @@
-/*
- * eeh.c
- * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- */
-
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/notifier.h>
-#include <linux/pci.h>
-#include <linux/proc_fs.h>
-#include <linux/rbtree.h>
-#include <linux/seq_file.h>
-#include <linux/spinlock.h>
-#include <asm/atomic.h>
-#include <asm/eeh.h>
-#include <asm/io.h>
-#include <asm/machdep.h>
-#include <asm/rtas.h>
-#include <asm/atomic.h>
-#include <asm/systemcfg.h>
-#include <asm/ppc-pci.h>
-
-#undef DEBUG
-
-/** Overview:
- *  EEH, or "Extended Error Handling" is a PCI bridge technology for
- *  dealing with PCI bus errors that can't be dealt with within the
- *  usual PCI framework, except by check-stopping the CPU.  Systems
- *  that are designed for high-availability/reliability cannot afford
- *  to crash due to a "mere" PCI error, thus the need for EEH.
- *  An EEH-capable bridge operates by converting a detected error
- *  into a "slot freeze", taking the PCI adapter off-line, making
- *  the slot behave, from the OS'es point of view, as if the slot
- *  were "empty": all reads return 0xff's and all writes are silently
- *  ignored.  EEH slot isolation events can be triggered by parity
- *  errors on the address or data busses (e.g. during posted writes),
- *  which in turn might be caused by low voltage on the bus, dust,
- *  vibration, humidity, radioactivity or plain-old failed hardware.
- *
- *  Note, however, that one of the leading causes of EEH slot
- *  freeze events are buggy device drivers, buggy device microcode,
- *  or buggy device hardware.  This is because any attempt by the
- *  device to bus-master data to a memory address that is not
- *  assigned to the device will trigger a slot freeze.   (The idea
- *  is to prevent devices-gone-wild from corrupting system memory).
- *  Buggy hardware/drivers will have a miserable time co-existing
- *  with EEH.
- *
- *  Ideally, a PCI device driver, when suspecting that an isolation
- *  event has occured (e.g. by reading 0xff's), will then ask EEH
- *  whether this is the case, and then take appropriate steps to
- *  reset the PCI slot, the PCI device, and then resume operations.
- *  However, until that day,  the checking is done here, with the
- *  eeh_check_failure() routine embedded in the MMIO macros.  If
- *  the slot is found to be isolated, an "EEH Event" is synthesized
- *  and sent out for processing.
- */
-
-/* EEH event workqueue setup. */
-static DEFINE_SPINLOCK(eeh_eventlist_lock);
-LIST_HEAD(eeh_eventlist);
-static void eeh_event_handler(void *);
-DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);
-
-static struct notifier_block *eeh_notifier_chain;
-
-/* If a device driver keeps reading an MMIO register in an interrupt
- * handler after a slot isolation event has occurred, we assume it
- * is broken and panic.  This sets the threshold for how many read
- * attempts we allow before panicking.
- */
-#define EEH_MAX_FAILS	100000
-
-/* RTAS tokens */
-static int ibm_set_eeh_option;
-static int ibm_set_slot_reset;
-static int ibm_read_slot_reset_state;
-static int ibm_read_slot_reset_state2;
-static int ibm_slot_error_detail;
-
-static int eeh_subsystem_enabled;
-
-/* Lock to avoid races due to multiple reports of an error */
-static DEFINE_SPINLOCK(confirm_error_lock);
-
-/* Buffer for reporting slot-error-detail rtas calls */
-static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
-static DEFINE_SPINLOCK(slot_errbuf_lock);
-static int eeh_error_buf_size;
-
-/* System monitoring statistics */
-static DEFINE_PER_CPU(unsigned long, no_device);
-static DEFINE_PER_CPU(unsigned long, no_dn);
-static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
-static DEFINE_PER_CPU(unsigned long, ignored_check);
-static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
-static DEFINE_PER_CPU(unsigned long, false_positives);
-static DEFINE_PER_CPU(unsigned long, ignored_failures);
-static DEFINE_PER_CPU(unsigned long, slot_resets);
-
-/**
- * The pci address cache subsystem.  This subsystem places
- * PCI device address resources into a red-black tree, sorted
- * according to the address range, so that given only an i/o
- * address, the corresponding PCI device can be **quickly**
- * found. It is safe to perform an address lookup in an interrupt
- * context; this ability is an important feature.
- *
- * Currently, the only customer of this code is the EEH subsystem;
- * thus, this code has been somewhat tailored to suit EEH better.
- * In particular, the cache does *not* hold the addresses of devices
- * for which EEH is not enabled.
- *
- * (Implementation Note: The RB tree seems to be better/faster
- * than any hash algo I could think of for this problem, even
- * with the penalty of slow pointer chases for d-cache misses).
- */
-struct pci_io_addr_range
-{
-	struct rb_node rb_node;
-	unsigned long addr_lo;
-	unsigned long addr_hi;
-	struct pci_dev *pcidev;
-	unsigned int flags;
-};
-
-static struct pci_io_addr_cache
-{
-	struct rb_root rb_root;
-	spinlock_t piar_lock;
-} pci_io_addr_cache_root;
-
-static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr)
-{
-	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
-
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (addr < piar->addr_lo) {
-			n = n->rb_left;
-		} else {
-			if (addr > piar->addr_hi) {
-				n = n->rb_right;
-			} else {
-				pci_dev_get(piar->pcidev);
-				return piar->pcidev;
-			}
-		}
-	}
-
-	return NULL;
-}
-
-/**
- * pci_get_device_by_addr - Get device, given only address
- * @addr: mmio (PIO) phys address or i/o port number
- *
- * Given an mmio phys address, or a port number, find a pci device
- * that implements this address.  Be sure to pci_dev_put the device
- * when finished.  I/O port numbers are assumed to be offset
- * from zero (that is, they do *not* have pci_io_addr added in).
- * It is safe to call this function within an interrupt.
- */
-static struct pci_dev *pci_get_device_by_addr(unsigned long addr)
-{
-	struct pci_dev *dev;
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	dev = __pci_get_device_by_addr(addr);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-	return dev;
-}
-
-#ifdef DEBUG
-/*
- * Handy-dandy debug print routine, does nothing more
- * than print out the contents of our addr cache.
- */
-static void pci_addr_cache_print(struct pci_io_addr_cache *cache)
-{
-	struct rb_node *n;
-	int cnt = 0;
-
-	n = rb_first(&cache->rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-		printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n",
-		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
-		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
-		cnt++;
-		n = rb_next(n);
-	}
-}
-#endif
-
-/* Insert address range into the rb tree. */
-static struct pci_io_addr_range *
-pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
-		      unsigned long ahi, unsigned int flags)
-{
-	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct pci_io_addr_range *piar;
-
-	/* Walk tree, find a place to insert into tree */
-	while (*p) {
-		parent = *p;
-		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
-		if (ahi < piar->addr_lo) {
-			p = &parent->rb_left;
-		} else if (alo > piar->addr_hi) {
-			p = &parent->rb_right;
-		} else {
-			if (dev != piar->pcidev ||
-			    alo != piar->addr_lo || ahi != piar->addr_hi) {
-				printk(KERN_WARNING "PIAR: overlapping address range\n");
-			}
-			return piar;
-		}
-	}
-	piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
-	if (!piar)
-		return NULL;
-
-	piar->addr_lo = alo;
-	piar->addr_hi = ahi;
-	piar->pcidev = dev;
-	piar->flags = flags;
-
-#ifdef DEBUG
-	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
-	                  alo, ahi, pci_name (dev));
-#endif
-
-	rb_link_node(&piar->rb_node, parent, p);
-	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
-
-	return piar;
-}
-
-static void __pci_addr_cache_insert_device(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	struct pci_dn *pdn;
-	int i;
-	int inserted = 0;
-
-	dn = pci_device_to_OF_node(dev);
-	if (!dn) {
-		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
-		return;
-	}
-
-	/* Skip any devices for which EEH is not enabled. */
-	pdn = PCI_DN(dn);
-	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
-	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
-#ifdef DEBUG
-		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
-		       pci_name(dev), pdn->node->full_name);
-#endif
-		return;
-	}
-
-	/* The cache holds a reference to the device... */
-	pci_dev_get(dev);
-
-	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
-		unsigned long start = pci_resource_start(dev,i);
-		unsigned long end = pci_resource_end(dev,i);
-		unsigned int flags = pci_resource_flags(dev,i);
-
-		/* We are interested only bus addresses, not dma or other stuff */
-		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
-			 continue;
-		pci_addr_cache_insert(dev, start, end, flags);
-		inserted = 1;
-	}
-
-	/* If there was nothing to add, the cache has no reference... */
-	if (!inserted)
-		pci_dev_put(dev);
-}
-
-/**
- * pci_addr_cache_insert_device - Add a device to the address cache
- * @dev: PCI device whose I/O addresses we are interested in.
- *
- * In order to support the fast lookup of devices based on addresses,
- * we maintain a cache of devices that can be quickly searched.
- * This routine adds a device to that cache.
- */
-static void pci_addr_cache_insert_device(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__pci_addr_cache_insert_device(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-static inline void __pci_addr_cache_remove_device(struct pci_dev *dev)
-{
-	struct rb_node *n;
-	int removed = 0;
-
-restart:
-	n = rb_first(&pci_io_addr_cache_root.rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (piar->pcidev == dev) {
-			rb_erase(n, &pci_io_addr_cache_root.rb_root);
-			removed = 1;
-			kfree(piar);
-			goto restart;
-		}
-		n = rb_next(n);
-	}
-
-	/* The cache no longer holds its reference to this device... */
-	if (removed)
-		pci_dev_put(dev);
-}
-
-/**
- * pci_addr_cache_remove_device - remove pci device from addr cache
- * @dev: device to remove
- *
- * Remove a device from the addr-cache tree.
- * This is potentially expensive, since it will walk
- * the tree multiple times (once per resource).
- * But so what; device removal doesn't need to be that fast.
- */
-static void pci_addr_cache_remove_device(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__pci_addr_cache_remove_device(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-/**
- * pci_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scaned for devices (after all device resources are known).
- */
-void __init pci_addr_cache_build(void)
-{
-	struct pci_dev *dev = NULL;
-
-	if (!eeh_subsystem_enabled)
-		return;
-
-	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
-
-	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
-		/* Ignore PCI bridges ( XXX why ??) */
-		if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) {
-			continue;
-		}
-		pci_addr_cache_insert_device(dev);
-	}
-
-#ifdef DEBUG
-	/* Verify tree built up above, echo back the list of addrs. */
-	pci_addr_cache_print(&pci_io_addr_cache_root);
-#endif
-}
-
-/* --------------------------------------------------------------- */
-/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
-
-void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
-{
-	unsigned long flags;
-	int rc;
-
-	/* Log the error with the rtas logger */
-	spin_lock_irqsave(&slot_errbuf_lock, flags);
-	memset(slot_errbuf, 0, eeh_error_buf_size);
-
-	rc = rtas_call(ibm_slot_error_detail,
-	               8, 1, NULL, pdn->eeh_config_addr,
-	               BUID_HI(pdn->phb->buid),
-	               BUID_LO(pdn->phb->buid), NULL, 0,
-	               virt_to_phys(slot_errbuf),
-	               eeh_error_buf_size,
-	               severity);
-
-	if (rc == 0)
-		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
-	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
-}
-
-/**
- * eeh_register_notifier - Register to find out about EEH events.
- * @nb: notifier block to callback on events
- */
-int eeh_register_notifier(struct notifier_block *nb)
-{
-	return notifier_chain_register(&eeh_notifier_chain, nb);
-}
-
-/**
- * eeh_unregister_notifier - Unregister to an EEH event notifier.
- * @nb: notifier block to callback on events
- */
-int eeh_unregister_notifier(struct notifier_block *nb)
-{
-	return notifier_chain_unregister(&eeh_notifier_chain, nb);
-}
-
-/**
- * read_slot_reset_state - Read the reset state of a device node's slot
- * @dn: device node to read
- * @rets: array to return results in
- */
-static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
-{
-	int token, outputs;
-
-	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
-		token = ibm_read_slot_reset_state2;
-		outputs = 4;
-	} else {
-		token = ibm_read_slot_reset_state;
-		rets[2] = 0; /* fake PE Unavailable info */
-		outputs = 3;
-	}
-
-	return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr,
-			 BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid));
-}
-
-/**
- * eeh_panic - call panic() for an eeh event that cannot be handled.
- * The philosophy of this routine is that it is better to panic and
- * halt the OS than it is to risk possible data corruption by
- * oblivious device drivers that don't know better.
- *
- * @dev pci device that had an eeh event
- * @reset_state current reset state of the device slot
- */
-static void eeh_panic(struct pci_dev *dev, int reset_state)
-{
-	/*
-	 * XXX We should create a separate sysctl for this.
-	 *
-	 * Since the panic_on_oops sysctl is used to halt the system
-	 * in light of potential corruption, we can use it here.
-	 */
-	if (panic_on_oops) {
-		struct device_node *dn = pci_device_to_OF_node(dev);
-		eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
-		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
-		      pci_name(dev));
-	}
-	else {
-		__get_cpu_var(ignored_failures)++;
-		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
-		       reset_state, pci_name(dev));
-	}
-}
-
-/**
- * eeh_event_handler - dispatch EEH events.  The detection of a frozen
- * slot can occur inside an interrupt, where it can be hard to do
- * anything about it.  The goal of this routine is to pull these
- * detection events out of the context of the interrupt handler, and
- * re-dispatch them for processing at a later time in a normal context.
- *
- * @dummy - unused
- */
-static void eeh_event_handler(void *dummy)
-{
-	unsigned long flags;
-	struct eeh_event	*event;
-
-	while (1) {
-		spin_lock_irqsave(&eeh_eventlist_lock, flags);
-		event = NULL;
-		if (!list_empty(&eeh_eventlist)) {
-			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
-			list_del(&event->list);
-		}
-		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-		if (event == NULL)
-			break;
-
-		printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
-		       "%s\n", event->reset_state,
-		       pci_name(event->dev));
-
-		notifier_call_chain (&eeh_notifier_chain,
-				     EEH_NOTIFY_FREEZE, event);
-
-		pci_dev_put(event->dev);
-		kfree(event);
-	}
-}
-
-/**
- * eeh_token_to_phys - convert EEH address token to phys address
- * @token i/o token, should be address in the form 0xA....
- */
-static inline unsigned long eeh_token_to_phys(unsigned long token)
-{
-	pte_t *ptep;
-	unsigned long pa;
-
-	ptep = find_linux_pte(init_mm.pgd, token);
-	if (!ptep)
-		return token;
-	pa = pte_pfn(*ptep) << PAGE_SHIFT;
-
-	return pa | (token & (PAGE_SIZE-1));
-}
-
-/** 
- * Return the "partitionable endpoint" (pe) under which this device lies
- */
-static struct device_node * find_device_pe(struct device_node *dn)
-{
-	while ((dn->parent) && PCI_DN(dn->parent) &&
-	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
-		dn = dn->parent;
-	}
-	return dn;
-}
-
-/** Mark all devices that are peers of this device as failed.
- *  Mark the device driver too, so that it can see the failure
- *  immediately; this is critical, since some drivers poll
- *  status registers in interrupts ... If a driver is polling,
- *  and the slot is frozen, then the driver can deadlock in
- *  an interrupt context, which is bad.
- */
-
-static inline void __eeh_mark_slot (struct device_node *dn)
-{
-	while (dn) {
-		PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
-
-		if (dn->child)
-			__eeh_mark_slot (dn->child);
-		dn = dn->sibling;
-	}
-}
-
-static inline void __eeh_clear_slot (struct device_node *dn)
-{
-	while (dn) {
-		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
-		if (dn->child)
-			__eeh_clear_slot (dn->child);
-		dn = dn->sibling;
-	}
-}
-
-static inline void eeh_clear_slot (struct device_node *dn)
-{
-	unsigned long flags;
-	spin_lock_irqsave(&confirm_error_lock, flags);
-	__eeh_clear_slot (dn);
-	spin_unlock_irqrestore(&confirm_error_lock, flags);
-}
-
-/**
- * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
- * @dn device node
- * @dev pci device, if known
- *
- * Check for an EEH failure for the given device node.  Call this
- * routine if the result of a read was all 0xff's and you want to
- * find out if this is due to an EEH slot freeze.  This routine
- * will query firmware for the EEH status.
- *
- * Returns 0 if there has not been an EEH error; otherwise returns
- * a non-zero value and queues up a slot isolation event notification.
- *
- * It is safe to call this routine in an interrupt context.
- */
-int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
-{
-	int ret;
-	int rets[3];
-	unsigned long flags;
-	int reset_state;
-	struct eeh_event  *event;
-	struct pci_dn *pdn;
-	struct device_node *pe_dn;
-	int rc = 0;
-
-	__get_cpu_var(total_mmio_ffs)++;
-
-	if (!eeh_subsystem_enabled)
-		return 0;
-
-	if (!dn) {
-		__get_cpu_var(no_dn)++;
-		return 0;
-	}
-	pdn = PCI_DN(dn);
-
-	/* Access to IO BARs might get this far and still not want checking. */
-	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
-	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
-		__get_cpu_var(ignored_check)++;
-#ifdef DEBUG
-		printk ("EEH:ignored check (%x) for %s %s\n", 
-		        pdn->eeh_mode, pci_name (dev), dn->full_name);
-#endif
-		return 0;
-	}
-
-	if (!pdn->eeh_config_addr) {
-		__get_cpu_var(no_cfg_addr)++;
-		return 0;
-	}
-
-	/* If we already have a pending isolation event for this
-	 * slot, we know it's bad already, we don't need to check.
-	 * Do this checking under a lock; as multiple PCI devices
-	 * in one slot might report errors simultaneously, and we
-	 * only want one error recovery routine running.
-	 */
-	spin_lock_irqsave(&confirm_error_lock, flags);
-	rc = 1;
-	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
-		pdn->eeh_check_count ++;
-		if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
-			printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
-			        pdn->eeh_check_count);
-			dump_stack();
-			
-			/* re-read the slot reset state */
-			if (read_slot_reset_state(pdn, rets) != 0)
-				rets[0] = -1;	/* reset state unknown */
-
-			/* If we are here, then we hit an infinite loop. Stop. */
-			panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
-		}
-		goto dn_unlock;
-	}
-
-	/*
-	 * Now test for an EEH failure.  This is VERY expensive.
-	 * Note that the eeh_config_addr may be a parent device
-	 * in the case of a device behind a bridge, or it may be
-	 * function zero of a multi-function device.
-	 * In any case they must share a common PHB.
-	 */
-	ret = read_slot_reset_state(pdn, rets);
-
-	/* If the call to firmware failed, punt */
-	if (ret != 0) {
-		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
-		       ret, dn->full_name);
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	/* If EEH is not supported on this device, punt. */
-	if (rets[1] != 1) {
-		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
-		       ret, dn->full_name);
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	/* If not the kind of error we know about, punt. */
-	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	/* Note that config-io to empty slots may fail;
-	 * we recognize empty because they don't have children. */
-	if ((rets[0] == 5) && (dn->child == NULL)) {
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	__get_cpu_var(slot_resets)++;
- 
-	/* Avoid repeated reports of this failure, including problems
-	 * with other functions on this device, and functions under
-	 * bridges. */
-	pe_dn = find_device_pe (dn);
-	__eeh_mark_slot (pe_dn);
-	spin_unlock_irqrestore(&confirm_error_lock, flags);
-
-	reset_state = rets[0];
-
-	eeh_slot_error_detail (pdn, 1 /* Temporary Error */);
-
-	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
-	       rets[0], dn->name, dn->full_name);
-	event = kmalloc(sizeof(*event), GFP_ATOMIC);
-	if (event == NULL) {
-		eeh_panic(dev, reset_state);
-		return 1;
- 	}
-
-	event->dev = dev;
-	event->dn = dn;
-	event->reset_state = reset_state;
-
-	/* We may or may not be called in an interrupt context */
-	spin_lock_irqsave(&eeh_eventlist_lock, flags);
-	list_add(&event->list, &eeh_eventlist);
-	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-
-	/* Most EEH events are due to device driver bugs.  Having
-	 * a stack trace will help the device-driver authors figure
-	 * out what happened.  So print that out. */
-	if (rets[0] != 5) dump_stack();
-	schedule_work(&eeh_event_wq);
-
-	return 1;
-
-dn_unlock:
-	spin_unlock_irqrestore(&confirm_error_lock, flags);
-	return rc;
-}
-
-EXPORT_SYMBOL_GPL(eeh_dn_check_failure);
-
-/**
- * eeh_check_failure - check if all 1's data is due to EEH slot freeze
- * @token i/o token, should be address in the form 0xA....
- * @val value, should be all 1's (XXX why do we need this arg??)
- *
- * Check for an EEH failure at the given token address.  Call this
- * routine if the result of a read was all 0xff's and you want to
- * find out if this is due to an EEH slot freeze event.  This routine
- * will query firmware for the EEH status.
- *
- * Note this routine is safe to call in an interrupt context.
- */
-unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
-{
-	unsigned long addr;
-	struct pci_dev *dev;
-	struct device_node *dn;
-
-	/* Finding the phys addr + pci device; this is pretty quick. */
-	addr = eeh_token_to_phys((unsigned long __force) token);
-	dev = pci_get_device_by_addr(addr);
-	if (!dev) {
-		__get_cpu_var(no_device)++;
-		return val;
-	}
-
-	dn = pci_device_to_OF_node(dev);
-	eeh_dn_check_failure (dn, dev);
-
-	pci_dev_put(dev);
-	return val;
-}
-
-EXPORT_SYMBOL(eeh_check_failure);
-
-struct eeh_early_enable_info {
-	unsigned int buid_hi;
-	unsigned int buid_lo;
-};
-
-/* Enable eeh for the given device node. */
-static void *early_enable_eeh(struct device_node *dn, void *data)
-{
-	struct eeh_early_enable_info *info = data;
-	int ret;
-	char *status = get_property(dn, "status", NULL);
-	u32 *class_code = (u32 *)get_property(dn, "class-code", NULL);
-	u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL);
-	u32 *device_id = (u32 *)get_property(dn, "device-id", NULL);
-	u32 *regs;
-	int enable;
-	struct pci_dn *pdn = PCI_DN(dn);
-
-	pdn->eeh_mode = 0;
-	pdn->eeh_check_count = 0;
-	pdn->eeh_freeze_count = 0;
-
-	if (status && strcmp(status, "ok") != 0)
-		return NULL;	/* ignore devices with bad status */
-
-	/* Ignore bad nodes. */
-	if (!class_code || !vendor_id || !device_id)
-		return NULL;
-
-	/* There is nothing to check on PCI to ISA bridges */
-	if (dn->type && !strcmp(dn->type, "isa")) {
-		pdn->eeh_mode |= EEH_MODE_NOCHECK;
-		return NULL;
-	}
-
-	/*
-	 * Now decide if we are going to "Disable" EEH checking
-	 * for this device.  We still run with the EEH hardware active,
-	 * but we won't be checking for ff's.  This means a driver
-	 * could return bad data (very bad!), an interrupt handler could
-	 * hang waiting on status bits that won't change, etc.
-	 * But there are a few cases like display devices that make sense.
-	 */
-	enable = 1;	/* i.e. we will do checking */
-	if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY)
-		enable = 0;
-
-	if (!enable)
-		pdn->eeh_mode |= EEH_MODE_NOCHECK;
-
-	/* Ok... see if this device supports EEH.  Some do, some don't,
-	 * and the only way to find out is to check each and every one. */
-	regs = (u32 *)get_property(dn, "reg", NULL);
-	if (regs) {
-		/* First register entry is addr (00BBSS00)  */
-		/* Try to enable eeh */
-		ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL,
-				regs[0], info->buid_hi, info->buid_lo,
-				EEH_ENABLE);
-		if (ret == 0) {
-			eeh_subsystem_enabled = 1;
-			pdn->eeh_mode |= EEH_MODE_SUPPORTED;
-			pdn->eeh_config_addr = regs[0];
-#ifdef DEBUG
-			printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name);
-#endif
-		} else {
-
-			/* This device doesn't support EEH, but it may have an
-			 * EEH parent, in which case we mark it as supported. */
-			if (dn->parent && PCI_DN(dn->parent)
-			    && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
-				/* Parent supports EEH. */
-				pdn->eeh_mode |= EEH_MODE_SUPPORTED;
-				pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr;
-				return NULL;
-			}
-		}
-	} else {
-		printk(KERN_WARNING "EEH: %s: unable to get reg property.\n",
-		       dn->full_name);
-	}
-
-	return NULL;
-}
-
-/*
- * Initialize EEH by trying to enable it for all of the adapters in the system.
- * As a side effect we can determine here if eeh is supported at all.
- * Note that we leave EEH on so failed config cycles won't cause a machine
- * check.  If a user turns off EEH for a particular adapter they are really
- * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
- * grant access to a slot if EEH isn't enabled, and so we always enable
- * EEH for all slots/all devices.
- *
- * The eeh-force-off option disables EEH checking globally, for all slots.
- * Even if force-off is set, the EEH hardware is still enabled, so that
- * newer systems can boot.
- */
-void __init eeh_init(void)
-{
-	struct device_node *phb, *np;
-	struct eeh_early_enable_info info;
-
-	spin_lock_init(&confirm_error_lock);
-	spin_lock_init(&slot_errbuf_lock);
-
-	np = of_find_node_by_path("/rtas");
-	if (np == NULL)
-		return;
-
-	ibm_set_eeh_option = rtas_token("ibm,set-eeh-option");
-	ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
-	ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2");
-	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
-	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
-
-	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE)
-		return;
-
-	eeh_error_buf_size = rtas_token("rtas-error-log-max");
-	if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
-		eeh_error_buf_size = 1024;
-	}
-	if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) {
-		printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated "
-		      "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX);
-		eeh_error_buf_size = RTAS_ERROR_LOG_MAX;
-	}
-
-	/* Enable EEH for all adapters.  Note that eeh requires buid's */
-	for (phb = of_find_node_by_name(NULL, "pci"); phb;
-	     phb = of_find_node_by_name(phb, "pci")) {
-		unsigned long buid;
-
-		buid = get_phb_buid(phb);
-		if (buid == 0 || PCI_DN(phb) == NULL)
-			continue;
-
-		info.buid_lo = BUID_LO(buid);
-		info.buid_hi = BUID_HI(buid);
-		traverse_pci_devices(phb, early_enable_eeh, &info);
-	}
-
-	if (eeh_subsystem_enabled)
-		printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n");
-	else
-		printk(KERN_WARNING "EEH: No capable adapters found\n");
-}
-
-/**
- * eeh_add_device_early - enable EEH for the indicated device_node
- * @dn: device node for which to set up EEH
- *
- * This routine must be used to perform EEH initialization for PCI
- * devices that were added after system boot (e.g. hotplug, dlpar).
- * This routine must be called before any i/o is performed to the
- * adapter (inluding any config-space i/o).
- * Whether this actually enables EEH or not for this device depends
- * on the CEC architecture, type of the device, on earlier boot
- * command-line arguments & etc.
- */
-void eeh_add_device_early(struct device_node *dn)
-{
-	struct pci_controller *phb;
-	struct eeh_early_enable_info info;
-
-	if (!dn || !PCI_DN(dn))
-		return;
-	phb = PCI_DN(dn)->phb;
-	if (NULL == phb || 0 == phb->buid) {
-		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
-		       dn->full_name);
-		dump_stack();
-		return;
-	}
-
-	info.buid_hi = BUID_HI(phb->buid);
-	info.buid_lo = BUID_LO(phb->buid);
-	early_enable_eeh(dn, &info);
-}
-EXPORT_SYMBOL_GPL(eeh_add_device_early);
-
-/**
- * eeh_add_device_late - perform EEH initialization for the indicated pci device
- * @dev: pci device for which to set up EEH
- *
- * This routine must be used to complete EEH initialization for PCI
- * devices that were added after system boot (e.g. hotplug, dlpar).
- */
-void eeh_add_device_late(struct pci_dev *dev)
-{
-	struct device_node *dn;
-
-	if (!dev || !eeh_subsystem_enabled)
-		return;
-
-#ifdef DEBUG
-	printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev));
-#endif
-
-	pci_dev_get (dev);
-	dn = pci_device_to_OF_node(dev);
-	PCI_DN(dn)->pcidev = dev;
-
-	pci_addr_cache_insert_device (dev);
-}
-EXPORT_SYMBOL_GPL(eeh_add_device_late);
-
-/**
- * eeh_remove_device - undo EEH setup for the indicated pci device
- * @dev: pci device to be removed
- *
- * This routine should be when a device is removed from a running
- * system (e.g. by hotplug or dlpar).
- */
-void eeh_remove_device(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	if (!dev || !eeh_subsystem_enabled)
-		return;
-
-	/* Unregister the device with the EEH/PCI address search system */
-#ifdef DEBUG
-	printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev));
-#endif
-	pci_addr_cache_remove_device(dev);
-
-	dn = pci_device_to_OF_node(dev);
-	PCI_DN(dn)->pcidev = NULL;
-	pci_dev_put (dev);
-}
-EXPORT_SYMBOL_GPL(eeh_remove_device);
-
-static int proc_eeh_show(struct seq_file *m, void *v)
-{
-	unsigned int cpu;
-	unsigned long ffs = 0, positives = 0, failures = 0;
-	unsigned long resets = 0;
-	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
-
-	for_each_cpu(cpu) {
-		ffs += per_cpu(total_mmio_ffs, cpu);
-		positives += per_cpu(false_positives, cpu);
-		failures += per_cpu(ignored_failures, cpu);
-		resets += per_cpu(slot_resets, cpu);
-		no_dev += per_cpu(no_device, cpu);
-		no_dn += per_cpu(no_dn, cpu);
-		no_cfg += per_cpu(no_cfg_addr, cpu);
-		no_check += per_cpu(ignored_check, cpu);
-	}
-
-	if (0 == eeh_subsystem_enabled) {
-		seq_printf(m, "EEH Subsystem is globally disabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
-	} else {
-		seq_printf(m, "EEH Subsystem is enabled\n");
-		seq_printf(m,
-				"no device=%ld\n"
-				"no device node=%ld\n"
-				"no config address=%ld\n"
-				"check not wanted=%ld\n"
-				"eeh_total_mmio_ffs=%ld\n"
-				"eeh_false_positives=%ld\n"
-				"eeh_ignored_failures=%ld\n"
-				"eeh_slot_resets=%ld\n",
-				no_dev, no_dn, no_cfg, no_check,
-				ffs, positives, failures, resets);
-	}
-
-	return 0;
-}
-
-static int proc_eeh_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, proc_eeh_show, NULL);
-}
-
-static struct file_operations proc_eeh_operations = {
-	.open      = proc_eeh_open,
-	.read      = seq_read,
-	.llseek    = seq_lseek,
-	.release   = single_release,
-};
-
-static int __init eeh_init_proc(void)
-{
-	struct proc_dir_entry *e;
-
-	if (systemcfg->platform & PLATFORM_PSERIES) {
-		e = create_proc_entry("ppc64/eeh", 0, NULL);
-		if (e)
-			e->proc_fops = &proc_eeh_operations;
-	}
-
-	return 0;
-}
-__initcall(eeh_init_proc);
Index: linux-2.6.14-git3/arch/ppc64/kernel/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/Makefile	2005-11-02 14:29:22.485829789 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/Makefile	2005-11-02 14:30:49.805589414 -0600
@@ -35,7 +35,6 @@
 			 bpa_iic.o spider-pic.o
 
 obj-$(CONFIG_KEXEC)		+= machine_kexec.o
-obj-$(CONFIG_EEH)		+= eeh.o
 obj-$(CONFIG_PROC_FS)		+= proc_ppc64.o
 obj-$(CONFIG_RTAS_FLASH)	+= rtas_flash.o
 obj-$(CONFIG_SMP)		+= smp.o
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile	2005-10-31 11:19:47.000000000 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:31:36.150092654 -0600
@@ -3,3 +3,4 @@
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_IBMVIO)	+= vio.o
 obj-$(CONFIG_XICS)	+= xics.o
+obj-$(CONFIG_EEH)    += eeh.o
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:30:49.790591516 -0600
@@ -0,0 +1,1093 @@
+/*
+ * eeh.c
+ * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/rbtree.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+#include <asm/eeh.h>
+#include <asm/io.h>
+#include <asm/machdep.h>
+#include <asm/rtas.h>
+#include <asm/atomic.h>
+#include <asm/systemcfg.h>
+#include <asm/ppc-pci.h>
+
+#undef DEBUG
+
+/** Overview:
+ *  EEH, or "Extended Error Handling" is a PCI bridge technology for
+ *  dealing with PCI bus errors that can't be dealt with within the
+ *  usual PCI framework, except by check-stopping the CPU.  Systems
+ *  that are designed for high-availability/reliability cannot afford
+ *  to crash due to a "mere" PCI error, thus the need for EEH.
+ *  An EEH-capable bridge operates by converting a detected error
+ *  into a "slot freeze", taking the PCI adapter off-line, making
+ *  the slot behave, from the OS'es point of view, as if the slot
+ *  were "empty": all reads return 0xff's and all writes are silently
+ *  ignored.  EEH slot isolation events can be triggered by parity
+ *  errors on the address or data busses (e.g. during posted writes),
+ *  which in turn might be caused by low voltage on the bus, dust,
+ *  vibration, humidity, radioactivity or plain-old failed hardware.
+ *
+ *  Note, however, that one of the leading causes of EEH slot
+ *  freeze events are buggy device drivers, buggy device microcode,
+ *  or buggy device hardware.  This is because any attempt by the
+ *  device to bus-master data to a memory address that is not
+ *  assigned to the device will trigger a slot freeze.   (The idea
+ *  is to prevent devices-gone-wild from corrupting system memory).
+ *  Buggy hardware/drivers will have a miserable time co-existing
+ *  with EEH.
+ *
+ *  Ideally, a PCI device driver, when suspecting that an isolation
+ *  event has occured (e.g. by reading 0xff's), will then ask EEH
+ *  whether this is the case, and then take appropriate steps to
+ *  reset the PCI slot, the PCI device, and then resume operations.
+ *  However, until that day,  the checking is done here, with the
+ *  eeh_check_failure() routine embedded in the MMIO macros.  If
+ *  the slot is found to be isolated, an "EEH Event" is synthesized
+ *  and sent out for processing.
+ */
+
+/* EEH event workqueue setup. */
+static DEFINE_SPINLOCK(eeh_eventlist_lock);
+LIST_HEAD(eeh_eventlist);
+static void eeh_event_handler(void *);
+DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);
+
+static struct notifier_block *eeh_notifier_chain;
+
+/* If a device driver keeps reading an MMIO register in an interrupt
+ * handler after a slot isolation event has occurred, we assume it
+ * is broken and panic.  This sets the threshold for how many read
+ * attempts we allow before panicking.
+ */
+#define EEH_MAX_FAILS	100000
+
+/* RTAS tokens */
+static int ibm_set_eeh_option;
+static int ibm_set_slot_reset;
+static int ibm_read_slot_reset_state;
+static int ibm_read_slot_reset_state2;
+static int ibm_slot_error_detail;
+
+static int eeh_subsystem_enabled;
+
+/* Lock to avoid races due to multiple reports of an error */
+static DEFINE_SPINLOCK(confirm_error_lock);
+
+/* Buffer for reporting slot-error-detail rtas calls */
+static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
+static DEFINE_SPINLOCK(slot_errbuf_lock);
+static int eeh_error_buf_size;
+
+/* System monitoring statistics */
+static DEFINE_PER_CPU(unsigned long, no_device);
+static DEFINE_PER_CPU(unsigned long, no_dn);
+static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
+static DEFINE_PER_CPU(unsigned long, ignored_check);
+static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
+static DEFINE_PER_CPU(unsigned long, false_positives);
+static DEFINE_PER_CPU(unsigned long, ignored_failures);
+static DEFINE_PER_CPU(unsigned long, slot_resets);
+
+/**
+ * The pci address cache subsystem.  This subsystem places
+ * PCI device address resources into a red-black tree, sorted
+ * according to the address range, so that given only an i/o
+ * address, the corresponding PCI device can be **quickly**
+ * found. It is safe to perform an address lookup in an interrupt
+ * context; this ability is an important feature.
+ *
+ * Currently, the only customer of this code is the EEH subsystem;
+ * thus, this code has been somewhat tailored to suit EEH better.
+ * In particular, the cache does *not* hold the addresses of devices
+ * for which EEH is not enabled.
+ *
+ * (Implementation Note: The RB tree seems to be better/faster
+ * than any hash algo I could think of for this problem, even
+ * with the penalty of slow pointer chases for d-cache misses).
+ */
+struct pci_io_addr_range
+{
+	struct rb_node rb_node;
+	unsigned long addr_lo;
+	unsigned long addr_hi;
+	struct pci_dev *pcidev;
+	unsigned int flags;
+};
+
+static struct pci_io_addr_cache
+{
+	struct rb_root rb_root;
+	spinlock_t piar_lock;
+} pci_io_addr_cache_root;
+
+static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr)
+{
+	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
+
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (addr < piar->addr_lo) {
+			n = n->rb_left;
+		} else {
+			if (addr > piar->addr_hi) {
+				n = n->rb_right;
+			} else {
+				pci_dev_get(piar->pcidev);
+				return piar->pcidev;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * pci_get_device_by_addr - Get device, given only address
+ * @addr: mmio (PIO) phys address or i/o port number
+ *
+ * Given an mmio phys address, or a port number, find a pci device
+ * that implements this address.  Be sure to pci_dev_put the device
+ * when finished.  I/O port numbers are assumed to be offset
+ * from zero (that is, they do *not* have pci_io_addr added in).
+ * It is safe to call this function within an interrupt.
+ */
+static struct pci_dev *pci_get_device_by_addr(unsigned long addr)
+{
+	struct pci_dev *dev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	dev = __pci_get_device_by_addr(addr);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+	return dev;
+}
+
+#ifdef DEBUG
+/*
+ * Handy-dandy debug print routine, does nothing more
+ * than print out the contents of our addr cache.
+ */
+static void pci_addr_cache_print(struct pci_io_addr_cache *cache)
+{
+	struct rb_node *n;
+	int cnt = 0;
+
+	n = rb_first(&cache->rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+		printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n",
+		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
+		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
+		cnt++;
+		n = rb_next(n);
+	}
+}
+#endif
+
+/* Insert address range into the rb tree. */
+static struct pci_io_addr_range *
+pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
+		      unsigned long ahi, unsigned int flags)
+{
+	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct pci_io_addr_range *piar;
+
+	/* Walk tree, find a place to insert into tree */
+	while (*p) {
+		parent = *p;
+		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
+		if (ahi < piar->addr_lo) {
+			p = &parent->rb_left;
+		} else if (alo > piar->addr_hi) {
+			p = &parent->rb_right;
+		} else {
+			if (dev != piar->pcidev ||
+			    alo != piar->addr_lo || ahi != piar->addr_hi) {
+				printk(KERN_WARNING "PIAR: overlapping address range\n");
+			}
+			return piar;
+		}
+	}
+	piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
+	if (!piar)
+		return NULL;
+
+	piar->addr_lo = alo;
+	piar->addr_hi = ahi;
+	piar->pcidev = dev;
+	piar->flags = flags;
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
+	                  alo, ahi, pci_name (dev));
+#endif
+
+	rb_link_node(&piar->rb_node, parent, p);
+	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
+
+	return piar;
+}
+
+static void __pci_addr_cache_insert_device(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	struct pci_dn *pdn;
+	int i;
+	int inserted = 0;
+
+	dn = pci_device_to_OF_node(dev);
+	if (!dn) {
+		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
+		return;
+	}
+
+	/* Skip any devices for which EEH is not enabled. */
+	pdn = PCI_DN(dn);
+	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
+	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
+#ifdef DEBUG
+		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
+		       pci_name(dev), pdn->node->full_name);
+#endif
+		return;
+	}
+
+	/* The cache holds a reference to the device... */
+	pci_dev_get(dev);
+
+	/* Walk resources on this device, poke them into the tree */
+	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+		unsigned long start = pci_resource_start(dev,i);
+		unsigned long end = pci_resource_end(dev,i);
+		unsigned int flags = pci_resource_flags(dev,i);
+
+		/* We are interested only bus addresses, not dma or other stuff */
+		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+			continue;
+		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
+			 continue;
+		pci_addr_cache_insert(dev, start, end, flags);
+		inserted = 1;
+	}
+
+	/* If there was nothing to add, the cache has no reference... */
+	if (!inserted)
+		pci_dev_put(dev);
+}
+
+/**
+ * pci_addr_cache_insert_device - Add a device to the address cache
+ * @dev: PCI device whose I/O addresses we are interested in.
+ *
+ * In order to support the fast lookup of devices based on addresses,
+ * we maintain a cache of devices that can be quickly searched.
+ * This routine adds a device to that cache.
+ */
+static void pci_addr_cache_insert_device(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__pci_addr_cache_insert_device(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+static inline void __pci_addr_cache_remove_device(struct pci_dev *dev)
+{
+	struct rb_node *n;
+	int removed = 0;
+
+restart:
+	n = rb_first(&pci_io_addr_cache_root.rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (piar->pcidev == dev) {
+			rb_erase(n, &pci_io_addr_cache_root.rb_root);
+			removed = 1;
+			kfree(piar);
+			goto restart;
+		}
+		n = rb_next(n);
+	}
+
+	/* The cache no longer holds its reference to this device... */
+	if (removed)
+		pci_dev_put(dev);
+}
+
+/**
+ * pci_addr_cache_remove_device - remove pci device from addr cache
+ * @dev: device to remove
+ *
+ * Remove a device from the addr-cache tree.
+ * This is potentially expensive, since it will walk
+ * the tree multiple times (once per resource).
+ * But so what; device removal doesn't need to be that fast.
+ */
+static void pci_addr_cache_remove_device(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__pci_addr_cache_remove_device(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+/**
+ * pci_addr_cache_build - Build a cache of I/O addresses
+ *
+ * Build a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ * This routine scans all pci busses to build the cache.
+ * Must be run late in boot process, after the pci controllers
+ * have been scaned for devices (after all device resources are known).
+ */
+void __init pci_addr_cache_build(void)
+{
+	struct pci_dev *dev = NULL;
+
+	if (!eeh_subsystem_enabled)
+		return;
+
+	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+		/* Ignore PCI bridges ( XXX why ??) */
+		if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) {
+			continue;
+		}
+		pci_addr_cache_insert_device(dev);
+	}
+
+#ifdef DEBUG
+	/* Verify tree built up above, echo back the list of addrs. */
+	pci_addr_cache_print(&pci_io_addr_cache_root);
+#endif
+}
+
+/* --------------------------------------------------------------- */
+/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
+
+void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
+{
+	unsigned long flags;
+	int rc;
+
+	/* Log the error with the rtas logger */
+	spin_lock_irqsave(&slot_errbuf_lock, flags);
+	memset(slot_errbuf, 0, eeh_error_buf_size);
+
+	rc = rtas_call(ibm_slot_error_detail,
+	               8, 1, NULL, pdn->eeh_config_addr,
+	               BUID_HI(pdn->phb->buid),
+	               BUID_LO(pdn->phb->buid), NULL, 0,
+	               virt_to_phys(slot_errbuf),
+	               eeh_error_buf_size,
+	               severity);
+
+	if (rc == 0)
+		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
+	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+}
+
+/**
+ * eeh_register_notifier - Register to find out about EEH events.
+ * @nb: notifier block to callback on events
+ */
+int eeh_register_notifier(struct notifier_block *nb)
+{
+	return notifier_chain_register(&eeh_notifier_chain, nb);
+}
+
+/**
+ * eeh_unregister_notifier - Unregister to an EEH event notifier.
+ * @nb: notifier block to callback on events
+ */
+int eeh_unregister_notifier(struct notifier_block *nb)
+{
+	return notifier_chain_unregister(&eeh_notifier_chain, nb);
+}
+
+/**
+ * read_slot_reset_state - Read the reset state of a device node's slot
+ * @dn: device node to read
+ * @rets: array to return results in
+ */
+static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
+{
+	int token, outputs;
+
+	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
+		token = ibm_read_slot_reset_state2;
+		outputs = 4;
+	} else {
+		token = ibm_read_slot_reset_state;
+		rets[2] = 0; /* fake PE Unavailable info */
+		outputs = 3;
+	}
+
+	return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr,
+			 BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid));
+}
+
+/**
+ * eeh_panic - call panic() for an eeh event that cannot be handled.
+ * The philosophy of this routine is that it is better to panic and
+ * halt the OS than it is to risk possible data corruption by
+ * oblivious device drivers that don't know better.
+ *
+ * @dev pci device that had an eeh event
+ * @reset_state current reset state of the device slot
+ */
+static void eeh_panic(struct pci_dev *dev, int reset_state)
+{
+	/*
+	 * XXX We should create a separate sysctl for this.
+	 *
+	 * Since the panic_on_oops sysctl is used to halt the system
+	 * in light of potential corruption, we can use it here.
+	 */
+	if (panic_on_oops) {
+		struct device_node *dn = pci_device_to_OF_node(dev);
+		eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
+		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
+		      pci_name(dev));
+	}
+	else {
+		__get_cpu_var(ignored_failures)++;
+		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
+		       reset_state, pci_name(dev));
+	}
+}
+
+/**
+ * eeh_event_handler - dispatch EEH events.  The detection of a frozen
+ * slot can occur inside an interrupt, where it can be hard to do
+ * anything about it.  The goal of this routine is to pull these
+ * detection events out of the context of the interrupt handler, and
+ * re-dispatch them for processing at a later time in a normal context.
+ *
+ * @dummy - unused
+ */
+static void eeh_event_handler(void *dummy)
+{
+	unsigned long flags;
+	struct eeh_event	*event;
+
+	while (1) {
+		spin_lock_irqsave(&eeh_eventlist_lock, flags);
+		event = NULL;
+		if (!list_empty(&eeh_eventlist)) {
+			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
+			list_del(&event->list);
+		}
+		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+		if (event == NULL)
+			break;
+
+		printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
+		       "%s\n", event->reset_state,
+		       pci_name(event->dev));
+
+		notifier_call_chain (&eeh_notifier_chain,
+				     EEH_NOTIFY_FREEZE, event);
+
+		pci_dev_put(event->dev);
+		kfree(event);
+	}
+}
+
+/**
+ * eeh_token_to_phys - convert EEH address token to phys address
+ * @token i/o token, should be address in the form 0xA....
+ */
+static inline unsigned long eeh_token_to_phys(unsigned long token)
+{
+	pte_t *ptep;
+	unsigned long pa;
+
+	ptep = find_linux_pte(init_mm.pgd, token);
+	if (!ptep)
+		return token;
+	pa = pte_pfn(*ptep) << PAGE_SHIFT;
+
+	return pa | (token & (PAGE_SIZE-1));
+}
+
+/** 
+ * Return the "partitionable endpoint" (pe) under which this device lies
+ */
+static struct device_node * find_device_pe(struct device_node *dn)
+{
+	while ((dn->parent) && PCI_DN(dn->parent) &&
+	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+		dn = dn->parent;
+	}
+	return dn;
+}
+
+/** Mark all devices that are peers of this device as failed.
+ *  Mark the device driver too, so that it can see the failure
+ *  immediately; this is critical, since some drivers poll
+ *  status registers in interrupts ... If a driver is polling,
+ *  and the slot is frozen, then the driver can deadlock in
+ *  an interrupt context, which is bad.
+ */
+
+static inline void __eeh_mark_slot (struct device_node *dn)
+{
+	while (dn) {
+		PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
+
+		if (dn->child)
+			__eeh_mark_slot (dn->child);
+		dn = dn->sibling;
+	}
+}
+
+static inline void __eeh_clear_slot (struct device_node *dn)
+{
+	while (dn) {
+		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
+		if (dn->child)
+			__eeh_clear_slot (dn->child);
+		dn = dn->sibling;
+	}
+}
+
+static inline void eeh_clear_slot (struct device_node *dn)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&confirm_error_lock, flags);
+	__eeh_clear_slot (dn);
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+}
+
+/**
+ * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
+ * @dn device node
+ * @dev pci device, if known
+ *
+ * Check for an EEH failure for the given device node.  Call this
+ * routine if the result of a read was all 0xff's and you want to
+ * find out if this is due to an EEH slot freeze.  This routine
+ * will query firmware for the EEH status.
+ *
+ * Returns 0 if there has not been an EEH error; otherwise returns
+ * a non-zero value and queues up a slot isolation event notification.
+ *
+ * It is safe to call this routine in an interrupt context.
+ */
+int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
+{
+	int ret;
+	int rets[3];
+	unsigned long flags;
+	int reset_state;
+	struct eeh_event  *event;
+	struct pci_dn *pdn;
+	struct device_node *pe_dn;
+	int rc = 0;
+
+	__get_cpu_var(total_mmio_ffs)++;
+
+	if (!eeh_subsystem_enabled)
+		return 0;
+
+	if (!dn) {
+		__get_cpu_var(no_dn)++;
+		return 0;
+	}
+	pdn = PCI_DN(dn);
+
+	/* Access to IO BARs might get this far and still not want checking. */
+	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
+	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
+		__get_cpu_var(ignored_check)++;
+#ifdef DEBUG
+		printk ("EEH:ignored check (%x) for %s %s\n", 
+		        pdn->eeh_mode, pci_name (dev), dn->full_name);
+#endif
+		return 0;
+	}
+
+	if (!pdn->eeh_config_addr) {
+		__get_cpu_var(no_cfg_addr)++;
+		return 0;
+	}
+
+	/* If we already have a pending isolation event for this
+	 * slot, we know it's bad already, we don't need to check.
+	 * Do this checking under a lock; as multiple PCI devices
+	 * in one slot might report errors simultaneously, and we
+	 * only want one error recovery routine running.
+	 */
+	spin_lock_irqsave(&confirm_error_lock, flags);
+	rc = 1;
+	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
+		pdn->eeh_check_count ++;
+		if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
+			printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
+			        pdn->eeh_check_count);
+			dump_stack();
+			
+			/* re-read the slot reset state */
+			if (read_slot_reset_state(pdn, rets) != 0)
+				rets[0] = -1;	/* reset state unknown */
+
+			/* If we are here, then we hit an infinite loop. Stop. */
+			panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
+		}
+		goto dn_unlock;
+	}
+
+	/*
+	 * Now test for an EEH failure.  This is VERY expensive.
+	 * Note that the eeh_config_addr may be a parent device
+	 * in the case of a device behind a bridge, or it may be
+	 * function zero of a multi-function device.
+	 * In any case they must share a common PHB.
+	 */
+	ret = read_slot_reset_state(pdn, rets);
+
+	/* If the call to firmware failed, punt */
+	if (ret != 0) {
+		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
+		       ret, dn->full_name);
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	/* If EEH is not supported on this device, punt. */
+	if (rets[1] != 1) {
+		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
+		       ret, dn->full_name);
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	/* If not the kind of error we know about, punt. */
+	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	/* Note that config-io to empty slots may fail;
+	 * we recognize empty because they don't have children. */
+	if ((rets[0] == 5) && (dn->child == NULL)) {
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	__get_cpu_var(slot_resets)++;
+ 
+	/* Avoid repeated reports of this failure, including problems
+	 * with other functions on this device, and functions under
+	 * bridges. */
+	pe_dn = find_device_pe (dn);
+	__eeh_mark_slot (pe_dn);
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+
+	reset_state = rets[0];
+
+	eeh_slot_error_detail (pdn, 1 /* Temporary Error */);
+
+	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
+	       rets[0], dn->name, dn->full_name);
+	event = kmalloc(sizeof(*event), GFP_ATOMIC);
+	if (event == NULL) {
+		eeh_panic(dev, reset_state);
+		return 1;
+ 	}
+
+	event->dev = dev;
+	event->dn = dn;
+	event->reset_state = reset_state;
+
+	/* We may or may not be called in an interrupt context */
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	list_add(&event->list, &eeh_eventlist);
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+
+	/* Most EEH events are due to device driver bugs.  Having
+	 * a stack trace will help the device-driver authors figure
+	 * out what happened.  So print that out. */
+	if (rets[0] != 5) dump_stack();
+	schedule_work(&eeh_event_wq);
+
+	return 1;
+
+dn_unlock:
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+	return rc;
+}
+
+EXPORT_SYMBOL_GPL(eeh_dn_check_failure);
+
+/**
+ * eeh_check_failure - check if all 1's data is due to EEH slot freeze
+ * @token i/o token, should be address in the form 0xA....
+ * @val value, should be all 1's (XXX why do we need this arg??)
+ *
+ * Check for an EEH failure at the given token address.  Call this
+ * routine if the result of a read was all 0xff's and you want to
+ * find out if this is due to an EEH slot freeze event.  This routine
+ * will query firmware for the EEH status.
+ *
+ * Note this routine is safe to call in an interrupt context.
+ */
+unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
+{
+	unsigned long addr;
+	struct pci_dev *dev;
+	struct device_node *dn;
+
+	/* Finding the phys addr + pci device; this is pretty quick. */
+	addr = eeh_token_to_phys((unsigned long __force) token);
+	dev = pci_get_device_by_addr(addr);
+	if (!dev) {
+		__get_cpu_var(no_device)++;
+		return val;
+	}
+
+	dn = pci_device_to_OF_node(dev);
+	eeh_dn_check_failure (dn, dev);
+
+	pci_dev_put(dev);
+	return val;
+}
+
+EXPORT_SYMBOL(eeh_check_failure);
+
+struct eeh_early_enable_info {
+	unsigned int buid_hi;
+	unsigned int buid_lo;
+};
+
+/* Enable eeh for the given device node. */
+static void *early_enable_eeh(struct device_node *dn, void *data)
+{
+	struct eeh_early_enable_info *info = data;
+	int ret;
+	char *status = get_property(dn, "status", NULL);
+	u32 *class_code = (u32 *)get_property(dn, "class-code", NULL);
+	u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL);
+	u32 *device_id = (u32 *)get_property(dn, "device-id", NULL);
+	u32 *regs;
+	int enable;
+	struct pci_dn *pdn = PCI_DN(dn);
+
+	pdn->eeh_mode = 0;
+	pdn->eeh_check_count = 0;
+	pdn->eeh_freeze_count = 0;
+
+	if (status && strcmp(status, "ok") != 0)
+		return NULL;	/* ignore devices with bad status */
+
+	/* Ignore bad nodes. */
+	if (!class_code || !vendor_id || !device_id)
+		return NULL;
+
+	/* There is nothing to check on PCI to ISA bridges */
+	if (dn->type && !strcmp(dn->type, "isa")) {
+		pdn->eeh_mode |= EEH_MODE_NOCHECK;
+		return NULL;
+	}
+
+	/*
+	 * Now decide if we are going to "Disable" EEH checking
+	 * for this device.  We still run with the EEH hardware active,
+	 * but we won't be checking for ff's.  This means a driver
+	 * could return bad data (very bad!), an interrupt handler could
+	 * hang waiting on status bits that won't change, etc.
+	 * But there are a few cases like display devices that make sense.
+	 */
+	enable = 1;	/* i.e. we will do checking */
+	if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY)
+		enable = 0;
+
+	if (!enable)
+		pdn->eeh_mode |= EEH_MODE_NOCHECK;
+
+	/* Ok... see if this device supports EEH.  Some do, some don't,
+	 * and the only way to find out is to check each and every one. */
+	regs = (u32 *)get_property(dn, "reg", NULL);
+	if (regs) {
+		/* First register entry is addr (00BBSS00)  */
+		/* Try to enable eeh */
+		ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL,
+				regs[0], info->buid_hi, info->buid_lo,
+				EEH_ENABLE);
+		if (ret == 0) {
+			eeh_subsystem_enabled = 1;
+			pdn->eeh_mode |= EEH_MODE_SUPPORTED;
+			pdn->eeh_config_addr = regs[0];
+#ifdef DEBUG
+			printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name);
+#endif
+		} else {
+
+			/* This device doesn't support EEH, but it may have an
+			 * EEH parent, in which case we mark it as supported. */
+			if (dn->parent && PCI_DN(dn->parent)
+			    && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+				/* Parent supports EEH. */
+				pdn->eeh_mode |= EEH_MODE_SUPPORTED;
+				pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr;
+				return NULL;
+			}
+		}
+	} else {
+		printk(KERN_WARNING "EEH: %s: unable to get reg property.\n",
+		       dn->full_name);
+	}
+
+	return NULL;
+}
+
+/*
+ * Initialize EEH by trying to enable it for all of the adapters in the system.
+ * As a side effect we can determine here if eeh is supported at all.
+ * Note that we leave EEH on so failed config cycles won't cause a machine
+ * check.  If a user turns off EEH for a particular adapter they are really
+ * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
+ * grant access to a slot if EEH isn't enabled, and so we always enable
+ * EEH for all slots/all devices.
+ *
+ * The eeh-force-off option disables EEH checking globally, for all slots.
+ * Even if force-off is set, the EEH hardware is still enabled, so that
+ * newer systems can boot.
+ */
+void __init eeh_init(void)
+{
+	struct device_node *phb, *np;
+	struct eeh_early_enable_info info;
+
+	spin_lock_init(&confirm_error_lock);
+	spin_lock_init(&slot_errbuf_lock);
+
+	np = of_find_node_by_path("/rtas");
+	if (np == NULL)
+		return;
+
+	ibm_set_eeh_option = rtas_token("ibm,set-eeh-option");
+	ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
+	ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2");
+	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
+	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
+
+	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE)
+		return;
+
+	eeh_error_buf_size = rtas_token("rtas-error-log-max");
+	if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
+		eeh_error_buf_size = 1024;
+	}
+	if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) {
+		printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated "
+		      "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX);
+		eeh_error_buf_size = RTAS_ERROR_LOG_MAX;
+	}
+
+	/* Enable EEH for all adapters.  Note that eeh requires buid's */
+	for (phb = of_find_node_by_name(NULL, "pci"); phb;
+	     phb = of_find_node_by_name(phb, "pci")) {
+		unsigned long buid;
+
+		buid = get_phb_buid(phb);
+		if (buid == 0 || PCI_DN(phb) == NULL)
+			continue;
+
+		info.buid_lo = BUID_LO(buid);
+		info.buid_hi = BUID_HI(buid);
+		traverse_pci_devices(phb, early_enable_eeh, &info);
+	}
+
+	if (eeh_subsystem_enabled)
+		printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n");
+	else
+		printk(KERN_WARNING "EEH: No capable adapters found\n");
+}
+
+/**
+ * eeh_add_device_early - enable EEH for the indicated device_node
+ * @dn: device node for which to set up EEH
+ *
+ * This routine must be used to perform EEH initialization for PCI
+ * devices that were added after system boot (e.g. hotplug, dlpar).
+ * This routine must be called before any i/o is performed to the
+ * adapter (inluding any config-space i/o).
+ * Whether this actually enables EEH or not for this device depends
+ * on the CEC architecture, type of the device, on earlier boot
+ * command-line arguments & etc.
+ */
+void eeh_add_device_early(struct device_node *dn)
+{
+	struct pci_controller *phb;
+	struct eeh_early_enable_info info;
+
+	if (!dn || !PCI_DN(dn))
+		return;
+	phb = PCI_DN(dn)->phb;
+	if (NULL == phb || 0 == phb->buid) {
+		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
+		       dn->full_name);
+		dump_stack();
+		return;
+	}
+
+	info.buid_hi = BUID_HI(phb->buid);
+	info.buid_lo = BUID_LO(phb->buid);
+	early_enable_eeh(dn, &info);
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_early);
+
+/**
+ * eeh_add_device_late - perform EEH initialization for the indicated pci device
+ * @dev: pci device for which to set up EEH
+ *
+ * This routine must be used to complete EEH initialization for PCI
+ * devices that were added after system boot (e.g. hotplug, dlpar).
+ */
+void eeh_add_device_late(struct pci_dev *dev)
+{
+	struct device_node *dn;
+
+	if (!dev || !eeh_subsystem_enabled)
+		return;
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev));
+#endif
+
+	pci_dev_get (dev);
+	dn = pci_device_to_OF_node(dev);
+	PCI_DN(dn)->pcidev = dev;
+
+	pci_addr_cache_insert_device (dev);
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_late);
+
+/**
+ * eeh_remove_device - undo EEH setup for the indicated pci device
+ * @dev: pci device to be removed
+ *
+ * This routine should be when a device is removed from a running
+ * system (e.g. by hotplug or dlpar).
+ */
+void eeh_remove_device(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	if (!dev || !eeh_subsystem_enabled)
+		return;
+
+	/* Unregister the device with the EEH/PCI address search system */
+#ifdef DEBUG
+	printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev));
+#endif
+	pci_addr_cache_remove_device(dev);
+
+	dn = pci_device_to_OF_node(dev);
+	PCI_DN(dn)->pcidev = NULL;
+	pci_dev_put (dev);
+}
+EXPORT_SYMBOL_GPL(eeh_remove_device);
+
+static int proc_eeh_show(struct seq_file *m, void *v)
+{
+	unsigned int cpu;
+	unsigned long ffs = 0, positives = 0, failures = 0;
+	unsigned long resets = 0;
+	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
+
+	for_each_cpu(cpu) {
+		ffs += per_cpu(total_mmio_ffs, cpu);
+		positives += per_cpu(false_positives, cpu);
+		failures += per_cpu(ignored_failures, cpu);
+		resets += per_cpu(slot_resets, cpu);
+		no_dev += per_cpu(no_device, cpu);
+		no_dn += per_cpu(no_dn, cpu);
+		no_cfg += per_cpu(no_cfg_addr, cpu);
+		no_check += per_cpu(ignored_check, cpu);
+	}
+
+	if (0 == eeh_subsystem_enabled) {
+		seq_printf(m, "EEH Subsystem is globally disabled\n");
+		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
+	} else {
+		seq_printf(m, "EEH Subsystem is enabled\n");
+		seq_printf(m,
+				"no device=%ld\n"
+				"no device node=%ld\n"
+				"no config address=%ld\n"
+				"check not wanted=%ld\n"
+				"eeh_total_mmio_ffs=%ld\n"
+				"eeh_false_positives=%ld\n"
+				"eeh_ignored_failures=%ld\n"
+				"eeh_slot_resets=%ld\n",
+				no_dev, no_dn, no_cfg, no_check,
+				ffs, positives, failures, resets);
+	}
+
+	return 0;
+}
+
+static int proc_eeh_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_eeh_show, NULL);
+}
+
+static struct file_operations proc_eeh_operations = {
+	.open      = proc_eeh_open,
+	.read      = seq_read,
+	.llseek    = seq_lseek,
+	.release   = single_release,
+};
+
+static int __init eeh_init_proc(void)
+{
+	struct proc_dir_entry *e;
+
+	if (systemcfg->platform & PLATFORM_PSERIES) {
+		e = create_proc_entry("ppc64/eeh", 0, NULL);
+		if (e)
+			e->proc_fops = &proc_eeh_operations;
+	}
+
+	return 0;
+}
+__initcall(eeh_init_proc);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 12/42]: ppc64: PCI error event dispatcher
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (10 preceding siblings ...)
  2005-11-04  0:49 ` [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 Linas Vepstas
@ 2005-11-04  0:50 ` Linas Vepstas
  2005-11-04  0:50 ` [PATCH 13/42]: ppc64: PCI reset support routines Linas Vepstas
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:50 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

12-eeh-event-dispatcher.patch

ppc64: EEH Recovery dispatcher thread

This patch adds a mechanism to create recovery threads when an
EEH event is received.  Since an EEH freeze state may be detected 
within an interrupt context, we need to get out of the interrupt
context before starting recovery. This dispatcher does this in 
two steps: first, it uses a workqueue to get out, and then 
lanuches a kernel thread, so that the recovery routine can 
sleep for exteded periods without upseting the keventd.

A kernel thread is created with each EEH event, rather than 
having one long-running daemon started at boot time.  This is 
because it is anticipated that EEH events will be very rare 
(very very rare, ideally) and so its pointless to cluter the 
process tables with a daemon that will almost never run.


Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:30:49.790591516 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:32:35.713742506 -0600
@@ -19,7 +19,6 @@
 
 #include <linux/init.h>
 #include <linux/list.h>
-#include <linux/notifier.h>
 #include <linux/pci.h>
 #include <linux/proc_fs.h>
 #include <linux/rbtree.h>
@@ -27,12 +26,12 @@
 #include <linux/spinlock.h>
 #include <asm/atomic.h>
 #include <asm/eeh.h>
+#include <asm/eeh_event.h>
 #include <asm/io.h>
 #include <asm/machdep.h>
+#include <asm/ppc-pci.h>
 #include <asm/rtas.h>
-#include <asm/atomic.h>
 #include <asm/systemcfg.h>
-#include <asm/ppc-pci.h>
 
 #undef DEBUG
 
@@ -70,14 +69,6 @@
  *  and sent out for processing.
  */
 
-/* EEH event workqueue setup. */
-static DEFINE_SPINLOCK(eeh_eventlist_lock);
-LIST_HEAD(eeh_eventlist);
-static void eeh_event_handler(void *);
-DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);
-
-static struct notifier_block *eeh_notifier_chain;
-
 /* If a device driver keeps reading an MMIO register in an interrupt
  * handler after a slot isolation event has occurred, we assume it
  * is broken and panic.  This sets the threshold for how many read
@@ -421,24 +412,6 @@
 }
 
 /**
- * eeh_register_notifier - Register to find out about EEH events.
- * @nb: notifier block to callback on events
- */
-int eeh_register_notifier(struct notifier_block *nb)
-{
-	return notifier_chain_register(&eeh_notifier_chain, nb);
-}
-
-/**
- * eeh_unregister_notifier - Unregister to an EEH event notifier.
- * @nb: notifier block to callback on events
- */
-int eeh_unregister_notifier(struct notifier_block *nb)
-{
-	return notifier_chain_unregister(&eeh_notifier_chain, nb);
-}
-
-/**
  * read_slot_reset_state - Read the reset state of a device node's slot
  * @dn: device node to read
  * @rets: array to return results in
@@ -461,73 +434,6 @@
 }
 
 /**
- * eeh_panic - call panic() for an eeh event that cannot be handled.
- * The philosophy of this routine is that it is better to panic and
- * halt the OS than it is to risk possible data corruption by
- * oblivious device drivers that don't know better.
- *
- * @dev pci device that had an eeh event
- * @reset_state current reset state of the device slot
- */
-static void eeh_panic(struct pci_dev *dev, int reset_state)
-{
-	/*
-	 * XXX We should create a separate sysctl for this.
-	 *
-	 * Since the panic_on_oops sysctl is used to halt the system
-	 * in light of potential corruption, we can use it here.
-	 */
-	if (panic_on_oops) {
-		struct device_node *dn = pci_device_to_OF_node(dev);
-		eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
-		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
-		      pci_name(dev));
-	}
-	else {
-		__get_cpu_var(ignored_failures)++;
-		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
-		       reset_state, pci_name(dev));
-	}
-}
-
-/**
- * eeh_event_handler - dispatch EEH events.  The detection of a frozen
- * slot can occur inside an interrupt, where it can be hard to do
- * anything about it.  The goal of this routine is to pull these
- * detection events out of the context of the interrupt handler, and
- * re-dispatch them for processing at a later time in a normal context.
- *
- * @dummy - unused
- */
-static void eeh_event_handler(void *dummy)
-{
-	unsigned long flags;
-	struct eeh_event	*event;
-
-	while (1) {
-		spin_lock_irqsave(&eeh_eventlist_lock, flags);
-		event = NULL;
-		if (!list_empty(&eeh_eventlist)) {
-			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
-			list_del(&event->list);
-		}
-		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-		if (event == NULL)
-			break;
-
-		printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
-		       "%s\n", event->reset_state,
-		       pci_name(event->dev));
-
-		notifier_call_chain (&eeh_notifier_chain,
-				     EEH_NOTIFY_FREEZE, event);
-
-		pci_dev_put(event->dev);
-		kfree(event);
-	}
-}
-
-/**
  * eeh_token_to_phys - convert EEH address token to phys address
  * @token i/o token, should be address in the form 0xA....
  */
@@ -613,8 +519,6 @@
 	int ret;
 	int rets[3];
 	unsigned long flags;
-	int reset_state;
-	struct eeh_event  *event;
 	struct pci_dn *pdn;
 	struct device_node *pe_dn;
 	int rc = 0;
@@ -722,33 +626,12 @@
 	__eeh_mark_slot (pe_dn);
 	spin_unlock_irqrestore(&confirm_error_lock, flags);
 
-	reset_state = rets[0];
-
-	eeh_slot_error_detail (pdn, 1 /* Temporary Error */);
-
-	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
-	       rets[0], dn->name, dn->full_name);
-	event = kmalloc(sizeof(*event), GFP_ATOMIC);
-	if (event == NULL) {
-		eeh_panic(dev, reset_state);
-		return 1;
- 	}
-
-	event->dev = dev;
-	event->dn = dn;
-	event->reset_state = reset_state;
-
-	/* We may or may not be called in an interrupt context */
-	spin_lock_irqsave(&eeh_eventlist_lock, flags);
-	list_add(&event->list, &eeh_eventlist);
-	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-
+	eeh_send_failure_event (dn, dev, rets[0], rets[2]);
+	
 	/* Most EEH events are due to device driver bugs.  Having
 	 * a stack trace will help the device-driver authors figure
 	 * out what happened.  So print that out. */
 	if (rets[0] != 5) dump_stack();
-	schedule_work(&eeh_event_wq);
-
 	return 1;
 
 dn_unlock:
@@ -793,6 +676,14 @@
 
 EXPORT_SYMBOL(eeh_check_failure);
 
+/* ------------------------------------------------------------- */
+/* The code below deals with enabling EEH for devices during  the
+ * early boot sequence.  EEH must be enabled before any PCI probing
+ * can be done.
+ */
+
+#define EEH_ENABLE 1
+
 struct eeh_early_enable_info {
 	unsigned int buid_hi;
 	unsigned int buid_lo;
@@ -850,8 +741,9 @@
 		/* First register entry is addr (00BBSS00)  */
 		/* Try to enable eeh */
 		ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL,
-				regs[0], info->buid_hi, info->buid_lo,
-				EEH_ENABLE);
+		                regs[0], info->buid_hi, info->buid_lo,
+		                EEH_ENABLE);
+
 		if (ret == 0) {
 			eeh_subsystem_enabled = 1;
 			pdn->eeh_mode |= EEH_MODE_SUPPORTED;
Index: linux-2.6.14-git3/include/asm-powerpc/eeh_event.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/include/asm-powerpc/eeh_event.h	2005-11-02 14:32:35.718741805 -0600
@@ -0,0 +1,52 @@
+/*
+ *	eeh_event.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (c) 2005 Linas Vepstas <linas@linas.org>
+ */
+
+#ifndef ASM_PPC64_EEH_EVENT_H
+#define ASM_PPC64_EEH_EVENT_H
+
+/** EEH event -- structure holding pci controller data that describes
+ *  a change in the isolation status of a PCI slot.  A pointer
+ *  to this struct is passed as the data pointer in a notify callback.
+ */
+struct eeh_event {
+	struct list_head     list;
+	struct device_node 	*dn;   /* struct device node */
+	struct pci_dev       *dev;  /* affected device */
+	int                  state;
+	int time_unavail;    /* milliseconds until device might be available */
+};
+
+/**
+ * eeh_send_failure_event - generate a PCI error event
+ * @dev pci device
+ *
+ * This routine builds a PCI error event which will be delivered
+ * to all listeners on the peh_notifier_chain.
+ *
+ * This routine can be called within an interrupt context;
+ * the actual event will be delivered in a normal context
+ * (from a workqueue).
+ */
+int eeh_send_failure_event (struct device_node *dn,
+                            struct pci_dev *dev,
+                            int reset_state,
+                            int time_unavail);
+
+#endif /* ASM_PPC64_EEH_EVENT_H */
Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h	2005-11-02 14:29:21.496968403 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/eeh.h	2005-11-02 14:32:35.725740824 -0600
@@ -1,4 +1,4 @@
-/* 
+/*
  * eeh.h
  * Copyright (C) 2001  Dave Engebretsen & Todd Inglett IBM Corporation.
  *
@@ -6,12 +6,12 @@
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
@@ -27,8 +27,6 @@
 
 struct pci_dev;
 struct device_node;
-struct device_node;
-struct notifier_block;
 
 #ifdef CONFIG_EEH
 
@@ -37,6 +35,10 @@
 #define EEH_MODE_NOCHECK	(1<<1)
 #define EEH_MODE_ISOLATED	(1<<2)
 
+/* Max number of EEH freezes allowed before we consider the device
+ * to be permanently disabled. */
+#define EEH_MAX_ALLOWED_FREEZES 5
+
 void __init eeh_init(void);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
 				unsigned long val);
@@ -59,36 +61,14 @@
  * eeh_remove_device - undo EEH setup for the indicated pci device
  * @dev: pci device to be removed
  *
- * This routine should be when a device is removed from a running
- * system (e.g. by hotplug or dlpar).
+ * This routine should be called when a device is removed from
+ * a running system (e.g. by hotplug or dlpar).  It unregisters
+ * the PCI device from the EEH subsystem.  I/O errors affecting
+ * this device will no longer be detected after this call; thus,
+ * i/o errors affecting this slot may leave this device unusable.
  */
 void eeh_remove_device(struct pci_dev *);
 
-#define EEH_DISABLE		0
-#define EEH_ENABLE		1
-#define EEH_RELEASE_LOADSTORE	2
-#define EEH_RELEASE_DMA		3
-
-/**
- * Notifier event flags.
- */
-#define EEH_NOTIFY_FREEZE  1
-
-/** EEH event -- structure holding pci slot data that describes
- *  a change in the isolation status of a PCI slot.  A pointer
- *  to this struct is passed as the data pointer in a notify callback.
- */
-struct eeh_event {
-	struct list_head     list;
-	struct pci_dev       *dev;
-	struct device_node   *dn;
-	int                  reset_state;
-};
-
-/** Register to find out about EEH events. */
-int eeh_register_notifier(struct notifier_block *nb);
-int eeh_unregister_notifier(struct notifier_block *nb);
-
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
  *
@@ -129,7 +109,7 @@
 #define EEH_IO_ERROR_VALUE(size) (-1UL)
 #endif /* CONFIG_EEH */
 
-/* 
+/*
  * MMIO read/write operations with EEH support.
  */
 static inline u8 eeh_readb(const volatile void __iomem *addr)
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c	2005-11-02 14:32:35.731739983 -0600
@@ -0,0 +1,155 @@
+/*
+ * eeh_event.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ * Copyright (c) 2005 Linas Vepstas <linas@linas.org>
+ */
+
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <asm/eeh_event.h>
+
+/** Overview:
+ *  EEH error states may be detected within exception handlers;
+ *  however, the recovery processing needs to occur asynchronously
+ *  in a normal kernel context and not an interrupt context.
+ *  This pair of routines creates an event and queues it onto a
+ *  work-queue, where a worker thread can drive recovery.
+ */
+
+/* EEH event workqueue setup. */
+static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED;
+LIST_HEAD(eeh_eventlist);
+static void eeh_thread_launcher(void *);
+DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL);
+
+/**
+ * eeh_panic - call panic() for an eeh event that cannot be handled.
+ * The philosophy of this routine is that it is better to panic and
+ * halt the OS than it is to risk possible data corruption by
+ * oblivious device drivers that don't know better.
+ *
+ * @dev pci device that had an eeh event
+ * @reset_state current reset state of the device slot
+ */
+static void eeh_panic(struct pci_dev *dev, int reset_state)
+{
+	/*
+	 * Since the panic_on_oops sysctl is used to halt the system
+	 * in light of potential corruption, we can use it here.
+	 */
+	if (panic_on_oops) {
+		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
+		      pci_name(dev));
+	}
+	else {
+		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
+		       reset_state, pci_name(dev));
+	}
+}
+
+/**
+ * eeh_event_handler - dispatch EEH events.  The detection of a frozen
+ * slot can occur inside an interrupt, where it can be hard to do
+ * anything about it.  The goal of this routine is to pull these
+ * detection events out of the context of the interrupt handler, and
+ * re-dispatch them for processing at a later time in a normal context.
+ *
+ * @dummy - unused
+ */
+static int eeh_event_handler(void * dummy)
+{
+	unsigned long flags;
+	struct eeh_event	*event;
+
+	daemonize ("eehd");
+
+	while (1) {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		spin_lock_irqsave(&eeh_eventlist_lock, flags);
+		event = NULL;
+		if (!list_empty(&eeh_eventlist)) {
+			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
+			list_del(&event->list);
+		}
+		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+		if (event == NULL)
+			break;
+
+		printk(KERN_INFO "EEH: Detected PCI bus error on device %s\n",
+		       pci_name(event->dev));
+
+		eeh_panic (event->dev, event->state);
+
+		kfree(event);
+	}
+
+	return 0;
+}
+
+/**
+ * eeh_thread_launcher
+ *
+ * @dummy - unused
+ */
+static void eeh_thread_launcher(void *dummy)
+{
+	if (kernel_thread(eeh_event_handler, NULL, CLONE_KERNEL) < 0)
+		printk(KERN_ERR "Failed to start EEH daemon\n");
+}
+
+/**
+ * eeh_send_failure_event - generate a PCI error event
+ * @dev pci device
+ *
+ * This routine can be called within an interrupt context;
+ * the actual event will be delivered in a normal context
+ * (from a workqueue).
+ */
+int eeh_send_failure_event (struct device_node *dn,
+                            struct pci_dev *dev,
+                            int state,
+                            int time_unavail)
+{
+	unsigned long flags;
+	struct eeh_event *event;
+
+	event = kmalloc(sizeof(*event), GFP_ATOMIC);
+	if (event == NULL) {
+		printk (KERN_ERR "EEH: out of memory, event not handled\n");
+		return 1;
+ 	}
+
+	if (dev)
+		pci_dev_get(dev);
+
+	event->dn = dn;
+	event->dev = dev;
+	event->state = state;
+	event->time_unavail = time_unavail;
+
+	/* We may or may not be called in an interrupt context */
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	list_add(&event->list, &eeh_eventlist);
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+
+	schedule_work(&eeh_event_wq);
+
+	return 0;
+}
+
+/********************** END OF FILE ******************************/
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:31:36.150092654 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:32:55.306995693 -0600
@@ -3,4 +3,4 @@
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_IBMVIO)	+= vio.o
 obj-$(CONFIG_XICS)	+= xics.o
-obj-$(CONFIG_EEH)    += eeh.o
+obj-$(CONFIG_EEH)    += eeh.o eeh_event.o

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 13/42]: ppc64: PCI reset support routines
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (11 preceding siblings ...)
  2005-11-04  0:50 ` [PATCH 12/42]: ppc64: PCI error event dispatcher Linas Vepstas
@ 2005-11-04  0:50 ` Linas Vepstas
  2005-11-04  0:50 ` [PATCH 14/42]: ppc64: Save & restore of PCI device BARS Linas Vepstas
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:50 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

13-eeh-recovery-support-routines.patch

EEH Recovery support routines

This patch adds routines required to help drive the recovery of
EEH-frozen slots.  The main function is to drive the PCI #RST
signal line high for a qurter of a second, and then allow for 
a second & a half of settle time.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:29:20.596094683 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:33:42.083437903 -0600
@@ -51,4 +51,18 @@
 extern unsigned long pci_assign_all_buses;
 extern int pci_read_irq_line(struct pci_dev *pci_dev);
 
+/* ---- EEH internal-use-only related routines ---- */
+#ifdef CONFIG_EEH
+/**
+ * rtas_set_slot_reset -- unfreeze a frozen slot
+ *
+ * Clear the EEH-frozen condition on a slot.  This routine
+ * does this by asserting the PCI #RST line for 1/8th of
+ * a second; this routine will sleep while the adapter is
+ * being reset.
+ */
+void rtas_set_slot_reset (struct pci_dn *);
+
+#endif
+
 #endif /* _ASM_POWERPC_PPC_PCI_H */
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:32:35.713742506 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:33:42.096436081 -0600
@@ -17,6 +17,7 @@
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/list.h>
 #include <linux/pci.h>
@@ -677,6 +678,104 @@
 EXPORT_SYMBOL(eeh_check_failure);
 
 /* ------------------------------------------------------------- */
+/* The code below deals with error recovery */
+
+/** Return negative value if a permanent error, else return
+ * a number of milliseconds to wait until the PCI slot is
+ * ready to be used.
+ */
+static int
+eeh_slot_availability(struct pci_dn *pdn)
+{
+	int rc;
+	int rets[3];
+
+	rc = read_slot_reset_state(pdn, rets);
+
+	if (rc) return rc;
+
+	if (rets[1] == 0) return -1;  /* EEH is not supported */
+	if (rets[0] == 0)  return 0;  /* Oll Korrect */
+	if (rets[0] == 5) {
+		if (rets[2] == 0) return -1; /* permanently unavailable */
+		return rets[2]; /* number of millisecs to wait */
+	}
+	return -1;
+}
+
+/** rtas_pci_slot_reset raises/lowers the pci #RST line
+ *  state: 1/0 to raise/lower the #RST
+ *
+ * Clear the EEH-frozen condition on a slot.  This routine
+ * asserts the PCI #RST line if the 'state' argument is '1',
+ * and drops the #RST line if 'state is '0'.  This routine is
+ * safe to call in an interrupt context.
+ *
+ */
+
+static void
+rtas_pci_slot_reset(struct pci_dn *pdn, int state)
+{
+	int rc;
+
+	BUG_ON (pdn==NULL); 
+
+	if (!pdn->phb) {
+		printk (KERN_WARNING "EEH: in slot reset, device node %s has no phb\n",
+		        pdn->node->full_name);
+		return;
+	}
+
+	rc = rtas_call(ibm_set_slot_reset,4,1, NULL,
+	               pdn->eeh_config_addr,
+	               BUID_HI(pdn->phb->buid),
+	               BUID_LO(pdn->phb->buid),
+	               state);
+	if (rc) {
+		printk (KERN_WARNING "EEH: Unable to reset the failed slot, (%d) #RST=%d dn=%s\n", 
+		        rc, state, pdn->node->full_name);
+		return;
+	}
+
+	if (state == 0)
+		eeh_clear_slot (pdn->node->parent->child);
+}
+
+/** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second
+ *  dn -- device node to be reset.
+ */
+
+void
+rtas_set_slot_reset(struct pci_dn *pdn)
+{
+	int i, rc;
+
+	rtas_pci_slot_reset (pdn, 1);
+
+	/* The PCI bus requires that the reset be held high for at least
+	 * a 100 milliseconds. We wait a bit longer 'just in case'.  */
+
+#define PCI_BUS_RST_HOLD_TIME_MSEC 250
+	msleep (PCI_BUS_RST_HOLD_TIME_MSEC);
+	rtas_pci_slot_reset (pdn, 0);
+
+	/* After a PCI slot has been reset, the PCI Express spec requires
+	 * a 1.5 second idle time for the bus to stabilize, before starting
+	 * up traffic. */
+#define PCI_BUS_SETTLE_TIME_MSEC 1800
+	msleep (PCI_BUS_SETTLE_TIME_MSEC);
+
+	/* Now double check with the firmware to make sure the device is
+	 * ready to be used; if not, wait for recovery. */
+	for (i=0; i<10; i++) {
+		rc = eeh_slot_availability (pdn);
+		if (rc <= 0) break;
+
+		msleep (rc+100);
+	}
+}
+
+/* ------------------------------------------------------------- */
 /* The code below deals with enabling EEH for devices during  the
  * early boot sequence.  EEH must be enabled before any PCI probing
  * can be done.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 14/42]: ppc64: Save & restore of PCI device BARS
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (12 preceding siblings ...)
  2005-11-04  0:50 ` [PATCH 13/42]: ppc64: PCI reset support routines Linas Vepstas
@ 2005-11-04  0:50 ` Linas Vepstas
  2005-11-04  0:50 ` [PATCH 15/42]: Documentation: PCI Error Recovery Linas Vepstas
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:50 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

14-eeh-device-bar-save.patch

After a PCI device has been resest, the device BAR's and other config
space info must be restored to the same state as they were in when 
the firmware first handed us this device.  This will allow the 
PCI device driver, when restarted, to correctly recognize and set up
the device.

Tis patch saves the device config space as early as reasonable after
the firmware has handed over the device.  Te state resore funcion 
is inteded for use by the EEH recovery routines.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:33:42.096436081 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:34:19.926132452 -0600
@@ -77,6 +77,9 @@
  */
 #define EEH_MAX_FAILS	100000
 
+/* Misc forward declaraions */
+static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn);
+
 /* RTAS tokens */
 static int ibm_set_eeh_option;
 static int ibm_set_slot_reset;
@@ -366,6 +369,7 @@
  */
 void __init pci_addr_cache_build(void)
 {
+	struct device_node *dn;
 	struct pci_dev *dev = NULL;
 
 	if (!eeh_subsystem_enabled)
@@ -379,6 +383,10 @@
 			continue;
 		}
 		pci_addr_cache_insert_device(dev);
+
+		/* Save the BAR's; firmware doesn't restore these after EEH reset */
+		dn = pci_device_to_OF_node(dev);
+		eeh_save_bars(dev, PCI_DN(dn));
 	}
 
 #ifdef DEBUG
@@ -775,6 +783,108 @@
 	}
 }
 
+/* ------------------------------------------------------- */
+/** Save and restore of PCI BARs
+ *
+ * Although firmware will set up BARs during boot, it doesn't
+ * set up device BAR's after a device reset, although it will,
+ * if requested, set up bridge configuration. Thus, we need to
+ * configure the PCI devices ourselves.  
+ */
+
+/**
+ * __restore_bars - Restore the Base Address Registers
+ * Loads the PCI configuration space base address registers,
+ * the expansion ROM base address, the latency timer, and etc.
+ * from the saved values in the device node.
+ */
+static inline void __restore_bars (struct pci_dn *pdn)
+{
+	int i;
+
+	if (NULL==pdn->phb) return;
+	for (i=4; i<10; i++) {
+		rtas_write_config(pdn, i*4, 4, pdn->config_space[i]);
+	}
+
+	/* 12 == Expansion ROM Address */
+	rtas_write_config(pdn, 12*4, 4, pdn->config_space[12]);
+
+#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
+#define SAVED_BYTE(OFF) (((u8 *)(pdn->config_space))[BYTE_SWAP(OFF)])
+
+	rtas_write_config (pdn, PCI_CACHE_LINE_SIZE, 1,
+	            SAVED_BYTE(PCI_CACHE_LINE_SIZE));
+
+	rtas_write_config (pdn, PCI_LATENCY_TIMER, 1,
+	            SAVED_BYTE(PCI_LATENCY_TIMER));
+
+	/* max latency, min grant, interrupt pin and line */
+	rtas_write_config(pdn, 15*4, 4, pdn->config_space[15]);
+}
+
+/**
+ * eeh_restore_bars - restore the PCI config space info
+ *
+ * This routine performs a recursive walk to the children
+ * of this device as well.
+ */
+void eeh_restore_bars(struct pci_dn *pdn)
+{
+	struct device_node *dn;
+	if (!pdn) 
+		return;
+	
+	if (! pdn->eeh_is_bridge)
+		__restore_bars (pdn);
+
+	dn = pdn->node->child;
+	while (dn) {
+		eeh_restore_bars (PCI_DN(dn));
+		dn = dn->sibling;
+	}
+}
+
+/**
+ * eeh_save_bars - save device bars
+ *
+ * Save the values of the device bars. Unlike the restore
+ * routine, this routine is *not* recursive. This is because
+ * PCI devices are added individuallly; but, for the restore,
+ * an entire slot is reset at a time.
+ */
+static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn)
+{
+	int i;
+
+	if (!pdev || !pdn )
+		return;
+	
+	for (i = 0; i < 16; i++)
+		pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]);
+
+	if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+		pdn->eeh_is_bridge = 1;
+}
+
+void
+rtas_configure_bridge(struct pci_dn *pdn)
+{
+	int token = rtas_token ("ibm,configure-bridge");
+	int rc;
+
+	if (token == RTAS_UNKNOWN_SERVICE)
+		return;
+	rc = rtas_call(token,3,1, NULL,
+	               pdn->eeh_config_addr,
+	               BUID_HI(pdn->phb->buid),
+	               BUID_LO(pdn->phb->buid));
+	if (rc) {
+		printk (KERN_WARNING "EEH: Unable to configure device bridge (%d) for %s\n",
+		        rc, pdn->node->full_name);
+	}
+}
+
 /* ------------------------------------------------------------- */
 /* The code below deals with enabling EEH for devices during  the
  * early boot sequence.  EEH must be enabled before any PCI probing
@@ -977,6 +1087,7 @@
 void eeh_add_device_late(struct pci_dev *dev)
 {
 	struct device_node *dn;
+	struct pci_dn *pdn;
 
 	if (!dev || !eeh_subsystem_enabled)
 		return;
@@ -987,9 +1098,11 @@
 
 	pci_dev_get (dev);
 	dn = pci_device_to_OF_node(dev);
-	PCI_DN(dn)->pcidev = dev;
+	pdn = PCI_DN(dn);
+	pdn->pcidev = dev;
 
 	pci_addr_cache_insert_device (dev);
+	eeh_save_bars(dev, pdn);
 }
 EXPORT_SYMBOL_GPL(eeh_add_device_late);
 
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:33:42.083437903 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:34:19.931131751 -0600
@@ -63,6 +63,29 @@
  */
 void rtas_set_slot_reset (struct pci_dn *);
 
+/** 
+ * eeh_restore_bars - Restore device configuration info.
+ *
+ * A reset of a PCI device will clear out its config space.
+ * This routines will restore the config space for this
+ * device, and is children, to values previously obtained
+ * from the firmware.
+ */
+void eeh_restore_bars(struct pci_dn *);
+
+/**
+ * rtas_configure_bridge -- firmware initialization of pci bridge
+ *
+ * Ask the firmware to configure all PCI bridges devices
+ * located behind the indicated node. Required after a
+ * pci device reset. Does essentially the same hing as
+ * eeh_restore_bars, but for brdges, and lets firmware 
+ * do the work.
+ */
+void rtas_configure_bridge(struct pci_dn *);
+
+int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
+
 #endif
 
 #endif /* _ASM_POWERPC_PPC_PCI_H */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 15/42]: Documentation:  PCI Error Recovery
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (13 preceding siblings ...)
  2005-11-04  0:50 ` [PATCH 14/42]: ppc64: Save & restore of PCI device BARS Linas Vepstas
@ 2005-11-04  0:50 ` Linas Vepstas
  2005-11-04  0:50 ` [PATCH 16/42]: PCI: PCI Error reporting callbacks Linas Vepstas
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:50 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

215-pci-error-recovery_docs.patch

PCI Error Recovery: documentation patch

Various PCI bus errors can be signaled by newer PCI controllers.
Recovering from those errors requires an infrastructure to notify
affected device drivers of the error, and a way of walking through
a reset sequence.  This patch adds documentation describing the
current error recovery proposal.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

 Documentation/pci-error-recovery.txt |  246 +++++++++++++++++++++++++++++++++++
 MAINTAINERS                          |    7 
 2 files changed, 253 insertions(+)

Index: linux-2.6.14-git3/Documentation/pci-error-recovery.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/Documentation/pci-error-recovery.txt	2005-11-02 14:34:25.663328101 -0600
@@ -0,0 +1,246 @@
+
+                       PCI Error Recovery
+                       ------------------
+                         May 31, 2005
+
+               Current document maintainer:
+           Linas Vepstas <linas@austin.ibm.com>
+
+
+Some PCI bus controllers are able to detect certain "hard" PCI errors
+on the bus, such as parity errors on the data and address busses, as
+well as SERR and PERR errors.  These chipsets are then able to disable
+I/O to/from the affected device, so that, for example, a bad DMA
+address doesn't end up corrupting system memory.  These same chipsets
+are also able to reset the affected PCI device, and return it to
+working condition.  This document describes a generic API form
+performing error recovery.
+
+The core idea is that after a PCI error has been detected, there must
+be a way for the kernel to coordinate with all affected device drivers
+so that the pci card can be made operational again, possibly after
+performing a full electrical #RST of the PCI card.  The API below
+provides a generic API for device drivers to be notified of PCI
+errors, and to be notified of, and respond to, a reset sequence.
+
+Preliminary sketch of API, cut-n-pasted-n-modified email from
+Ben Herrenschmidt, circa 5 april 2005
+
+The error recovery API support is exposed to the driver in the form of
+a structure of function pointers pointed to by a new field in struct
+pci_driver. The absence of this pointer in pci_driver denotes an
+"non-aware" driver, behaviour on these is platform dependant.
+Platforms like ppc64 can try to simulate pci hotplug remove/add.
+
+The definition of "pci_error_token" is not covered here. It is based on
+Seto's work on the synchronous error detection. We still need to define
+functions for extracting infos out of an opaque error token. This is
+separate from this API.
+
+This structure has the form:
+
+struct pci_error_handlers
+{
+	int (*error_detected)(struct pci_dev *dev, pci_error_token error);
+	int (*mmio_enabled)(struct pci_dev *dev);
+	int (*resume)(struct pci_dev *dev);
+	int (*link_reset)(struct pci_dev *dev);
+	int (*slot_reset)(struct pci_dev *dev);
+};
+
+A driver doesn't have to implement all of these callbacks. The
+only mandatory one is error_detected(). If a callback is not
+implemented, the corresponding feature is considered unsupported.
+For example, if mmio_enabled() and resume() aren't there, then the
+driver is assumed as not doing any direct recovery and requires
+a reset. If link_reset() is not implemented, the card is assumed as
+not caring about link resets, in which case, if recover is supported,
+the core can try recover (but not slot_reset() unless it really did
+reset the slot). If slot_reset() is not supported, link_reset() can
+be called instead on a slot reset.
+
+At first, the call will always be :
+
+	1) error_detected()
+
+	Error detected. This is sent once after an error has been detected. At
+this point, the device might not be accessible anymore depending on the
+platform (the slot will be isolated on ppc64). The driver may already
+have "noticed" the error because of a failing IO, but this is the proper
+"synchronisation point", that is, it gives a chance to the driver to
+cleanup, waiting for pending stuff (timers, whatever, etc...) to
+complete; it can take semaphores, schedule, etc... everything but touch
+the device. Within this function and after it returns, the driver
+shouldn't do any new IOs. Called in task context. This is sort of a
+"quiesce" point. See note about interrupts at the end of this doc.
+
+	Result codes:
+		- PCIERR_RESULT_CAN_RECOVER:
+		  Driever returns this if it thinks it might be able to recover
+		  the HW by just banging IOs or if it wants to be given
+		  a chance to extract some diagnostic informations (see
+		  below).
+		- PCIERR_RESULT_NEED_RESET:
+		  Driver returns this if it thinks it can't recover unless the
+		  slot is reset.
+		- PCIERR_RESULT_DISCONNECT:
+		  Return this if driver thinks it won't recover at all,
+		  (this will detach the driver ? or just leave it
+		  dangling ? to be decided)
+
+So at this point, we have called error_detected() for all drivers
+on the segment that had the error. On ppc64, the slot is isolated. What
+happens now typically depends on the result from the drivers. If all
+drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would
+re-enable IOs on the slot (or do nothing special if the platform doesn't
+isolate slots) and call 2). If not and we can reset slots, we go to 4),
+if neither, we have a dead slot. If it's an hotplug slot, we might
+"simulate" reset by triggering HW unplug/replug though.
+
+>>> Current ppc64 implementation assumes that a device driver will
+>>> *not* schedule or semaphore in this routine; the current ppc64
+>>> implementation uses one kernel thread to notify all devices;
+>>> thus, of one device sleeps/schedules, all devices are affected.
+>>> Doing better requires complex multi-threaded logic in the error
+>>> recovery implementation (e.g. waiting for all notification threads
+>>> to "join" before proceeding with recovery.)  This seems excessively
+>>> complex and not worth implementing.
+
+>>> The current ppc64 implementation doesn't much care if the device
+>>> attempts i/o at this point, or not.  I/O's will fail, returning
+>>> a value of 0xff on read, and writes will be dropped. If the device
+>>> driver attempts more than 10K I/O's to a frozen adapter, it will
+>>> assume that the device driver has gone into an infinite loop, and
+>>> it will panic the the kernel.
+
+	2) mmio_enabled()
+
+	This is the "early recovery" call. IOs are allowed again, but DMA is
+not (hrm... to be discussed, I prefer not), with some restrictions. This
+is NOT a callback for the driver to start operations again, only to
+peek/poke at the device, extract diagnostic information, if any, and
+eventually do things like trigger a device local reset or some such,
+but not restart operations. This is sent if all drivers on a segment
+agree that they can try to recover and no automatic link reset was
+performed by the HW. If the platform can't just re-enable IOs without
+a slot reset or a link reset, it doesn't call this callback and goes
+directly to 3) or 4). All IOs should be done _synchronously_ from
+within this callback, errors triggered by them will be returned via
+the normal pci_check_whatever() api, no new error_detected() callback
+will be issued due to an error happening here. However, such an error
+might cause IOs to be re-blocked for the whole segment, and thus
+invalidate the recovery that other devices on the same segment might
+have done, forcing the whole segment into one of the next states,
+that is link reset or slot reset.
+
+	Result codes:
+		- PCIERR_RESULT_RECOVERED
+		  Driver returns this if it thinks the device is fully
+		  functionnal and thinks it is ready to start
+		  normal driver operations again. There is no
+		  guarantee that the driver will actually be
+		  allowed to proceed, as another driver on the
+		  same segment might have failed and thus triggered a
+		  slot reset on platforms that support it.
+
+		- PCIERR_RESULT_NEED_RESET
+		  Driver returns this if it thinks the device is not
+		  recoverable in it's current state and it needs a slot
+		  reset to proceed.
+
+		- PCIERR_RESULT_DISCONNECT
+		  Same as above. Total failure, no recovery even after
+		  reset driver dead. (To be defined more precisely)
+
+>>> The current ppc64 implementation does not implement this callback.
+
+	3) link_reset()
+
+	This is called after the link has been reset. This is typically
+a PCI Express specific state at this point and is done whenever a
+non-fatal error has been detected that can be "solved" by resetting
+the link. This call informs the driver of the reset and the driver
+should check if the device appears to be in working condition.
+This function acts a bit like 2) mmio_enabled(), in that the driver
+is not supposed to restart normal driver I/O operations right away.
+Instead, it should just "probe" the device to check it's recoverability
+status. If all is right, then the core will call resume() once all
+drivers have ack'd link_reset().
+
+	Result codes:
+		(identical to mmio_enabled)
+
+>>> The current ppc64 implementation does not implement this callback.
+
+	4) slot_reset()
+
+	This is called after the slot has been soft or hard reset by the
+platform.  A soft reset consists of asserting the adapter #RST line
+and then restoring the PCI BARs and PCI configuration header. If the
+platform supports PCI hotplug, then it might instead perform a hard
+reset by toggling power on the slot off/on. This call gives drivers
+the chance to re-initialize the hardware (re-download firmware, etc.),
+but drivers shouldn't restart normal I/O processing operations at
+this point.  (See note about interrupts; interrupts aren't guaranteed
+to be delivered until the resume() callback has been called). If all
+device drivers report success on this callback, the patform will call
+resume() to complete the error handling and let the driver restart
+normal I/O processing.
+
+A driver can still return a critical failure for this function if
+it can't get the device operational after reset.  If the platform
+previously tried a soft reset, it migh now try a hard reset (power
+cycle) and then call slot_reset() again.  It the device still can't
+be recovered, there is nothing more that can be done;  the platform
+will typically report a "permanent failure" in such a case.  The
+device will be considered "dead" in this case.
+
+	Result codes:
+		- PCIERR_RESULT_DISCONNECT
+		Same as above.
+
+>>> The current ppc64 implementation does not try a power-cycle reset
+>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should.
+
+	5) resume()
+
+	This is called if all drivers on the segment have returned
+PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks.
+That basically tells the driver to restart activity, tht everything
+is back and running. No result code is taken into account here. If
+a new error happens, it will restart a new error handling process.
+
+That's it. I think this covers all the possibilities. The way those
+callbacks are called is platform policy. A platform with no slot reset
+capability for example may want to just "ignore" drivers that can't
+recover (disconnect them) and try to let other cards on the same segment
+recover. Keep in mind that in most real life cases, though, there will
+be only one driver per segment.
+
+Now, there is a note about interrupts. If you get an interrupt and your
+device is dead or has been isolated, there is a problem :)
+
+After much thinking, I decided to leave that to the platform. That is,
+the recovery API only precies that:
+
+ - There is no guarantee that interrupt delivery can proceed from any
+device on the segment starting from the error detection and until the
+restart callback is sent, at which point interrupts are expected to be
+fully operational.
+
+ - There is no guarantee that interrupt delivery is stopped, that is, ad
+river that gets an interrupts after detecting an error, or that detects
+and error within the interrupt handler such that it prevents proper
+ack'ing of the interrupt (and thus removal of the source) should just
+return IRQ_NOTHANDLED. It's up to the platform to deal with taht
+condition, typically by masking the irq source during the duration of
+the error handling. It is expected that the platform "knows" which
+interrupts are routed to error-management capable slots and can deal
+with temporarily disabling that irq number during error processing (this
+isn't terribly complex). That means some IRQ latency for other devices
+sharing the interrupt, but there is simply no other way. High end
+platforms aren't supposed to share interrupts between many devices
+anyway :)
+
+
+Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com>
Index: linux-2.6.14-git3/MAINTAINERS
===================================================================
--- linux-2.6.14-git3.orig/MAINTAINERS	2005-11-02 14:29:19.433257684 -0600
+++ linux-2.6.14-git3/MAINTAINERS	2005-11-02 14:34:25.700322915 -0600
@@ -1885,6 +1885,13 @@
 L:	linux-abi-devel@lists.sourceforge.net
 S:	Maintained
 
+PCI ERROR RECOVERY
+P: Linas Vepstas
+M: linas@austin.ibm.com
+L:	linux-kernel@vger.kernel.org
+L:	linux-pci@atrey.karlin.mff.cuni.cz
+S:	Supported
+
 PCI SOUND DRIVERS (ES1370, ES1371 and SONICVIBES)
 P:	Thomas Sailer
 M:	sailer@ife.ee.ethz.ch

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (14 preceding siblings ...)
  2005-11-04  0:50 ` [PATCH 15/42]: Documentation: PCI Error Recovery Linas Vepstas
@ 2005-11-04  0:50 ` Linas Vepstas
  2005-11-05  6:11   ` Greg KH
  2005-11-04  0:50 ` [PATCH 17/42]: ppc64: mark failed devices Linas Vepstas
                   ` (27 subsequent siblings)
  43 siblings, 1 reply; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:50 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

16-pci-error-recovery_header.patch

PCI Error Recovery: header file patch

Various PCI bus errors can be signaled by newer PCI controllers. Recovering 
from those errors requires an infrastructure to notify affected device drivers 
of the error, and a way of walking through a reset sequence.  This patch adds 
a set of callbacks to be used by error recovery routines to notify device 
drivers of the various stages of recovery.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
 include/linux/pci.h |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 49 insertions(+)

Index: linux-2.6.14-git3/include/linux/pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/linux/pci.h	2005-11-02 14:29:18.856338553 -0600
+++ linux-2.6.14-git3/include/linux/pci.h	2005-11-02 14:34:32.272401512 -0600
@@ -78,6 +78,16 @@
 #define PCI_UNKNOWN	((pci_power_t __force) 5)
 #define PCI_POWER_ERROR	((pci_power_t __force) -1)
 
+/** The pci_channel state describes connectivity between the CPU and
+ *  the pci device.  If some PCI bus between here and the pci device
+ *  has crashed or locked up, this info is reflected here.
+ */
+enum pci_channel_state {
+	pci_channel_io_normal = 0, /* I/O channel is in normal state */
+	pci_channel_io_frozen = 1, /* I/O to channel is blocked */
+	pci_channel_io_perm_failure, /* PCI card is dead */
+};
+
 /*
  * The pci_dev structure is used to describe PCI devices.
  */
@@ -110,6 +120,7 @@
 					   this is D0-D3, D0 being fully functional,
 					   and D3 being off. */
 
+	enum pci_channel_state error_state;  /* current connectivity state */
 	struct	device	dev;		/* Generic device interface */
 
 	/* device is compatible with these IDs */
@@ -232,6 +243,43 @@
 	unsigned int use_driver_data:1; /* pci_driver->driver_data is used */
 };
 
+/* ---------------------------------------------------------------- */
+/** PCI error recovery infrastructure.  If a PCI device driver provides
+ *  a set fof callbacks in struct pci_error_handlers, then that device driver
+ *  will be notified of PCI bus errors, and will be driven to recovery
+ *  when an error occurs.
+ */
+
+enum pcierr_result {
+	PCIERR_RESULT_NONE=0,        /* no result/none/not supported in device driver */
+	PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
+	PCIERR_RESULT_NEED_RESET,    /* Device driver wants slot to be reset. */
+	PCIERR_RESULT_DISCONNECT,    /* Device has completely failed, is unrecoverable */
+	PCIERR_RESULT_RECOVERED,     /* Device driver is fully recovered and operational */
+};
+
+/* PCI bus error event callbacks */
+struct pci_error_handlers
+{
+	/* PCI bus error detected on this device */
+	int (*error_detected)(struct pci_dev *dev,
+	                      enum pci_channel_state error);
+
+	/* MMIO has been re-enabled, but not DMA */
+	int (*mmio_enabled)(struct pci_dev *dev);
+
+	/* PCI Express link has been reset */
+	int (*link_reset)(struct pci_dev *dev);
+
+	/* PCI slot has been reset */
+	int (*slot_reset)(struct pci_dev *dev);
+
+	/* Device driver may resume normal operations */
+	void (*resume)(struct pci_dev *dev);
+};
+
+/* ---------------------------------------------------------------- */
+
 struct module;
 struct pci_driver {
 	struct list_head node;
@@ -245,6 +293,7 @@
 	int  (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable);   /* Enable wake event */
 	void (*shutdown) (struct pci_dev *dev);
 
+	struct pci_error_handlers *err_handler;
 	struct device_driver	driver;
 	struct pci_dynids dynids;
 };

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 17/42]: ppc64: mark failed devices
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (15 preceding siblings ...)
  2005-11-04  0:50 ` [PATCH 16/42]: PCI: PCI Error reporting callbacks Linas Vepstas
@ 2005-11-04  0:50 ` Linas Vepstas
  2005-11-04  0:51 ` [PATCH 18/42]: ppc64: bugfix: crash on dlpar slot add, remove Linas Vepstas
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:50 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

17-eeh-slot-marking-bug.patch

A device that experiences a PCI outage may be just one deivce out 
of many that was affected. In order to avoid repeated reports of 
a failure, the entire tree of affected devices should be marked 
as failed. This patch marks up the entire tree.

Signed-off-by: Linas Vepstas <linas@linas.org>


Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:34:19.926132452 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:35:39.290005477 -0600
@@ -479,32 +479,47 @@
  *  an interrupt context, which is bad.
  */
 
-static inline void __eeh_mark_slot (struct device_node *dn)
+static inline void __eeh_mark_slot (struct device_node *dn, int mode_flag)
 {
 	while (dn) {
-		PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
+		if (PCI_DN(dn)) {
+			PCI_DN(dn)->eeh_mode |= mode_flag;
 
-		if (dn->child)
-			__eeh_mark_slot (dn->child);
+			if (dn->child)
+				__eeh_mark_slot (dn->child, mode_flag);
+		}
 		dn = dn->sibling;
 	}
 }
 
-static inline void __eeh_clear_slot (struct device_node *dn)
+void eeh_mark_slot (struct device_node *dn, int mode_flag)
+{
+	dn = find_device_pe (dn);
+	PCI_DN(dn)->eeh_mode |= mode_flag;
+	__eeh_mark_slot (dn->child, mode_flag);
+}
+
+static inline void __eeh_clear_slot (struct device_node *dn, int mode_flag)
 {
 	while (dn) {
-		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
-		if (dn->child)
-			__eeh_clear_slot (dn->child);
+		if (PCI_DN(dn)) {
+			PCI_DN(dn)->eeh_mode &= ~mode_flag;
+			PCI_DN(dn)->eeh_check_count = 0;
+			if (dn->child)
+				__eeh_clear_slot (dn->child, mode_flag);
+		}
 		dn = dn->sibling;
 	}
 }
 
-static inline void eeh_clear_slot (struct device_node *dn)
+void eeh_clear_slot (struct device_node *dn, int mode_flag)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&confirm_error_lock, flags);
-	__eeh_clear_slot (dn);
+	dn = find_device_pe (dn);
+	PCI_DN(dn)->eeh_mode &= ~mode_flag;
+	PCI_DN(dn)->eeh_check_count = 0;
+	__eeh_clear_slot (dn->child, mode_flag);
 	spin_unlock_irqrestore(&confirm_error_lock, flags);
 }
 
@@ -529,7 +544,6 @@
 	int rets[3];
 	unsigned long flags;
 	struct pci_dn *pdn;
-	struct device_node *pe_dn;
 	int rc = 0;
 
 	__get_cpu_var(total_mmio_ffs)++;
@@ -631,8 +645,7 @@
 	/* Avoid repeated reports of this failure, including problems
 	 * with other functions on this device, and functions under
 	 * bridges. */
-	pe_dn = find_device_pe (dn);
-	__eeh_mark_slot (pe_dn);
+	eeh_mark_slot (dn, EEH_MODE_ISOLATED);
 	spin_unlock_irqrestore(&confirm_error_lock, flags);
 
 	eeh_send_failure_event (dn, dev, rets[0], rets[2]);
@@ -744,9 +757,6 @@
 		        rc, state, pdn->node->full_name);
 		return;
 	}
-
-	if (state == 0)
-		eeh_clear_slot (pdn->node->parent->child);
 }
 
 /** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second
@@ -765,6 +775,12 @@
 
 #define PCI_BUS_RST_HOLD_TIME_MSEC 250
 	msleep (PCI_BUS_RST_HOLD_TIME_MSEC);
+	
+	/* We might get hit with another EEH freeze as soon as the 
+	 * pci slot reset line is dropped. Make sure we don't miss
+	 * these, and clear the flag now. */
+	eeh_clear_slot (pdn->node, EEH_MODE_ISOLATED);
+
 	rtas_pci_slot_reset (pdn, 0);
 
 	/* After a PCI slot has been reset, the PCI Express spec requires
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:34:19.931131751 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:35:39.295004776 -0600
@@ -86,6 +86,13 @@
 
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 
+/**
+ * mark and clear slots: find "partition endpoint" PE and set or 
+ * clear the flags for each subnode of the PE.
+ */
+void eeh_mark_slot (struct device_node *dn, int mode_flag);
+void eeh_clear_slot (struct device_node *dn, int mode_flag);
+
 #endif
 
 #endif /* _ASM_POWERPC_PPC_PCI_H */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 18/42]: ppc64: bugfix: crash on dlpar slot add, remove
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (16 preceding siblings ...)
  2005-11-04  0:50 ` [PATCH 17/42]: ppc64: mark failed devices Linas Vepstas
@ 2005-11-04  0:51 ` Linas Vepstas
  2005-11-04  0:51 ` [PATCH 19/42]: ppc64: bugfix: crash on PHB add Linas Vepstas
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:51 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

18-crash-on-pci-slot-add.patch

This patch fixes a bugs related to dlpar slot add.

-- Crash is due to the fact the some children 
   of pci nodes are not pci nodes themselves, and thus do not 
   have pci_dn structures.  For example:
        /pci@800000020000002/pci@2,3/usb@1/hub@1
        /pci@800000020000002/pci@2,3/usb@1,1/hub@1

   A typical stack trace:
        Vector: 300 (Data Access) at [c0000000555637d0]
         pc: c000000000202a50: .dlpar_add_slot+0x108/0x410
             c000000000202e78 .add_slot_store+0x7c/0xac
             c000000000202da0 .dlpar_attr_store+0x48/0x64
             c0000000000f8ee4 .sysfs_write_file+0x100/0x1a0

   A similar stack trace is involved for the slot remove.

This code survived testing, of adding and removing different slots,
23 times each, so far, as of this writing.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


emailed to 
To: paulus@samba.org
Cc: linuxppc64-dev@ozlabs.org, johnrose@linux.ibm.com,
        linux-kernel@vger.kernel.org
Subject: [PATCH 2/2] ppc64: Crash in DLPAR code on remove operation

on 4 October 2005

Index: linux-2.6.14-git6/arch/ppc64/kernel/pci_dn.c
===================================================================
--- linux-2.6.14-git6.orig/arch/ppc64/kernel/pci_dn.c	2005-11-03 14:15:40.520737607 -0600
+++ linux-2.6.14-git6/arch/ppc64/kernel/pci_dn.c	2005-11-03 14:15:45.182083115 -0600
@@ -194,7 +194,10 @@
 
 	switch (action) {
 	case PSERIES_RECONFIG_ADD:
-		pci = np->parent->data;
+		pci = PCI_DN(np->parent);
+		if (!pci)
+			return NOTIFY_OK;
+
 		update_dn_pci_info(np, pci->phb);
 		break;
 	default:
Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/iommu.c
===================================================================
--- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/iommu.c	2005-11-03 14:14:32.131340002 -0600
+++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/iommu.c	2005-11-03 14:49:42.621970876 -0600
@@ -494,10 +494,13 @@
 {
 	int err = NOTIFY_OK;
 	struct device_node *np = node;
-	struct pci_dn *pci = np->data;
+	struct pci_dn *pci;
 
 	switch (action) {
 	case PSERIES_RECONFIG_REMOVE:
+		pci = PCI_DN(np);
+		if (!pci)
+			return NOTIFY_OK;
 		if (pci->iommu_table &&
 		    get_property(np, "ibm,dma-window", NULL))
 			iommu_free_table(np);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 19/42]: ppc64: bugfix: crash on PHB add
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (17 preceding siblings ...)
  2005-11-04  0:51 ` [PATCH 18/42]: ppc64: bugfix: crash on dlpar slot add, remove Linas Vepstas
@ 2005-11-04  0:51 ` Linas Vepstas
  2005-11-04 16:20   ` John Rose
  2005-11-04  0:51 ` [PATCH 20/42]: ppc64: PCI hotplug common code elimination Linas Vepstas
                   ` (24 subsequent siblings)
  43 siblings, 1 reply; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:51 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

19-rpaphp-crashing.patch

This patch fixes a bug related to dlpar PHB add, after a PHB removal.

-- The crash was due to the PHB not having a pci_dn structure yet,
   when the phb is being added.

This code survived testing, of adding and removeig the PHB and all slots
underneath it, 17 times so far, as of this writing.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

emailed to 
To: paulus@samba.org
Cc: linuxppc64-dev@ozlabs.org, linux-pci@atrey.karlin.mff.cuni.cz,
        johnrose@linux.ibm.com, linux-kernel@vger.kernel.org
Subject: [PATCH] rpaphp: PCI Hotplug crash on PHB DLPAR add

on 4 October 2005


Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c	2005-11-02 14:29:02.115685162 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c	2005-11-02 14:35:52.800111285 -0600
@@ -306,7 +306,7 @@
 {
 	struct pci_controller *phb;
 
-	if (PCI_DN(dn)->phb) {
+	if (PCI_DN(dn) && PCI_DN(dn)->phb) {
 		/* PHB already exists */
 		return -EINVAL;
 	}

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 20/42]: ppc64: PCI hotplug common code elimination
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (18 preceding siblings ...)
  2005-11-04  0:51 ` [PATCH 19/42]: ppc64: bugfix: crash on PHB add Linas Vepstas
@ 2005-11-04  0:51 ` Linas Vepstas
  2005-11-04  0:51 ` [PATCH 21/42]: PCI: cleanup/simplify ppc64 PCI hotplug code Linas Vepstas
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:51 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

20-rpaphp-eeh-cleanup.patch

This patch move some code from the rpaphp directory, to the ppc64 directory,
where it should have been all along (Among other things, I need it in the 
ppc64 directory for the PCI error recovery.)

Please note that patch affects TWO maintainers: Paul, after applying
the ppc64 part, please ask that GregKH appli the PCI part. It is safe
to have the ppc64 part go in first. It would be bad to have the 
PCI part go in first.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:35:39.290005477 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:36:41.255317484 -0600
@@ -1093,6 +1093,15 @@
 }
 EXPORT_SYMBOL_GPL(eeh_add_device_early);
 
+void eeh_add_device_tree_early(struct device_node *dn)
+{
+	struct device_node *sib;
+	for (sib = dn->child; sib; sib = sib->sibling)
+		eeh_add_device_tree_early(sib);
+	eeh_add_device_early(dn);
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_tree_early);
+
 /**
  * eeh_add_device_late - perform EEH initialization for the indicated pci device
  * @dev: pci device for which to set up EEH
@@ -1147,6 +1156,23 @@
 }
 EXPORT_SYMBOL_GPL(eeh_remove_device);
 
+void eeh_remove_bus_device(struct pci_dev *dev)
+{
+	eeh_remove_device(dev);
+	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+		struct pci_bus *bus = dev->subordinate;
+		struct list_head *ln;
+		if (!bus)
+			return; 
+		for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) {
+			struct pci_dev *pdev = pci_dev_b(ln);
+			if (pdev)
+				eeh_remove_bus_device(pdev);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(eeh_remove_bus_device);
+
 static int proc_eeh_show(struct seq_file *m, void *v)
 {
 	unsigned int cpu;
Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h	2005-11-02 14:32:35.725740824 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/eeh.h	2005-11-02 14:36:41.263316362 -0600
@@ -55,6 +55,7 @@
  * to finish the eeh setup for this device.
  */
 void eeh_add_device_early(struct device_node *);
+void eeh_add_device_tree_early(struct device_node *);
 void eeh_add_device_late(struct pci_dev *);
 
 /**
@@ -70,6 +71,15 @@
 void eeh_remove_device(struct pci_dev *);
 
 /**
+ * eeh_remove_device_recursive - undo EEH for device & children.
+ * @dev: pci device to be removed
+ *
+ * As above, this removes the device; it also removes child
+ * pci devices as well.
+ */
+void eeh_remove_bus_device(struct pci_dev *);
+
+/**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
  *
  * If this macro yields TRUE, the caller relays to eeh_check_failure()
Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:28:58.955128188 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:36:41.271315241 -0600
@@ -253,17 +253,6 @@
 	return dev;
 }
 
-static void enable_eeh(struct device_node *dn)
-{
-	struct device_node *sib;
-
-	for (sib = dn->child; sib; sib = sib->sibling) 
-		enable_eeh(sib);
-	eeh_add_device_early(dn);
-	return;
-	
-}
-
 static void print_slot_pci_funcs(struct pci_bus *bus)
 {
 	struct device_node *dn;
@@ -289,7 +278,7 @@
 	if (!dn)
 		goto exit;
 
-	enable_eeh(dn);
+	eeh_add_device_tree_early(dn);
 	dev = rpaphp_pci_config_slot(bus);
 	if (!dev) {
 		err("%s: can't find any devices.\n", __FUNCTION__);
@@ -303,30 +292,12 @@
 }
 EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter);
 
-static void rpaphp_eeh_remove_bus_device(struct pci_dev *dev)
-{
-	eeh_remove_device(dev);
-	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
-		struct pci_bus *bus = dev->subordinate;
-		struct list_head *ln;
-		if (!bus)
-			return; 
-		for (ln = bus->devices.next; ln != &bus->devices; ln = ln->next) {
-			struct pci_dev *pdev = pci_dev_b(ln);
-			if (pdev)
-				rpaphp_eeh_remove_bus_device(pdev);
-		}
-
-	}
-	return;
-}
-
 int rpaphp_unconfig_pci_adapter(struct pci_bus *bus)
 {
 	struct pci_dev *dev, *tmp;
 
 	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
-		rpaphp_eeh_remove_bus_device(dev);
+		eeh_remove_bus_device(dev);
 		pci_remove_bus_device(dev);
 	}
 	return 0;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 21/42]: PCI: cleanup/simplify ppc64 PCI hotplug code
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (19 preceding siblings ...)
  2005-11-04  0:51 ` [PATCH 20/42]: ppc64: PCI hotplug common code elimination Linas Vepstas
@ 2005-11-04  0:51 ` Linas Vepstas
  2005-11-04  0:52 ` [PATCH 22/42]: PCI: remove duplicted pci " Linas Vepstas
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:51 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

21-rpaphp-eeh-cleanup.patch

This patch cleans up some rpa dlpar code. Basically, 
the rpaphp_config_pci_adapter() was a wrapper routine, which
made two calls, and wrapped a bunch of verbose no-op code
around it.  This was consolidated wih the routine it called.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:36:41.271315241 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:36:48.081360405 -0600
@@ -221,18 +221,21 @@
  rpaphp_pci_config_slot() will  configure all devices under the
  given slot->dn and return the the first pci_dev.
  *****************************************************************************/
-static struct pci_dev *
-rpaphp_pci_config_slot(struct pci_bus *bus)
+int
+rpaphp_config_pci_adapter(struct pci_bus *bus)
 {
 	struct device_node *dn = pci_bus_to_OF_node(bus);
 	struct pci_dev *dev = NULL;
+	int rc = -ENODEV;
 	int slotno;
 	int num;
 
 	dbg("Enter %s: dn=%s bus=%s\n", __FUNCTION__, dn->full_name, bus->name);
 	if (!dn || !dn->child)
-		return NULL;
+		goto exit;
 
+	eeh_add_device_tree_early(dn);
+	
 	slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
 
 	/* pci_scan_slot should find all children */
@@ -243,15 +246,23 @@
 	}
 	if (list_empty(&bus->devices)) {
 		err("%s: No new device found\n", __FUNCTION__);
-		return NULL;
+		goto exit;
 	}
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
 			rpaphp_pci_config_bridge(dev);
 	}
 
-	return dev;
+	dbg("%s: pci_devs of slot[%s]\n", __FUNCTION__, dn->full_name);
+	list_for_each_entry (dev, &bus->devices, bus_list)
+		dbg("\t%s\n", pci_name(dev));
+
+	rc = 0;
+exit:
+	dbg("Exit %s:  rc=%d\n", __FUNCTION__, rc);
+	return rc;
 }
+EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter);
 
 static void print_slot_pci_funcs(struct pci_bus *bus)
 {
@@ -268,30 +279,6 @@
 	return;
 }
 
-int rpaphp_config_pci_adapter(struct pci_bus *bus)
-{
-	struct device_node *dn = pci_bus_to_OF_node(bus);
-	struct pci_dev *dev;
-	int rc = -ENODEV;
-
-	dbg("Entry %s: slot[%s]\n", __FUNCTION__, dn->full_name);
-	if (!dn)
-		goto exit;
-
-	eeh_add_device_tree_early(dn);
-	dev = rpaphp_pci_config_slot(bus);
-	if (!dev) {
-		err("%s: can't find any devices.\n", __FUNCTION__);
-		goto exit;
-	}
-	print_slot_pci_funcs(bus);
-	rc = 0;
-exit:
-	dbg("Exit %s:  rc=%d\n", __FUNCTION__, rc);
-	return rc;
-}
-EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter);
-
 int rpaphp_unconfig_pci_adapter(struct pci_bus *bus)
 {
 	struct pci_dev *dev, *tmp;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 22/42]: PCI: remove duplicted pci hotplug code
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (20 preceding siblings ...)
  2005-11-04  0:51 ` [PATCH 21/42]: PCI: cleanup/simplify ppc64 PCI hotplug code Linas Vepstas
@ 2005-11-04  0:52 ` Linas Vepstas
  2005-11-04 21:54   ` John Rose
  2005-11-04  0:52 ` [PATCH 23/42]: ppc64: migrate common PCI " Linas Vepstas
                   ` (21 subsequent siblings)
  43 siblings, 1 reply; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:52 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

22-rpaphp-eliminate-dupe-code.patch

The RPAPHP code contains two routines that appear to be gratiuitous copies
of very similar pci code.  In particular, 
   
   rpaphp_claim_resource ~~ pci_claim_resource
   rpadlpar_claim_one_bus == pcibios_claim_one_bus

This patch removes the rpaphp versions of the code.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:36:48.081360405 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:36:51.785840999 -0600
@@ -62,28 +62,6 @@
 }
 EXPORT_SYMBOL_GPL(rpaphp_find_pci_bus);
 
-int rpaphp_claim_resource(struct pci_dev *dev, int resource)
-{
-	struct resource *res = &dev->resource[resource];
-	struct resource *root = pci_find_parent_resource(dev, res);
-	char *dtype = resource < PCI_BRIDGE_RESOURCES ? "device" : "bridge";
-	int err = -EINVAL;
-
-	if (root != NULL) {
-		err = request_resource(root, res);
-	}
-
-	if (err) {
-		err("PCI: %s region %d of %s %s [%lx:%lx]\n",
-		    root ? "Address space collision on" :
-		    "No parent found for",
-		    resource, dtype, pci_name(dev), res->start, res->end);
-	}
-	return err;
-}
-
-EXPORT_SYMBOL_GPL(rpaphp_claim_resource);
-
 static int rpaphp_get_sensor_state(struct slot *slot, int *state)
 {
 	int rc;
@@ -178,7 +156,7 @@
 
 				if (r->parent || !r->start || !r->flags)
 					continue;
-				rpaphp_claim_resource(dev, i);
+				pci_claim_resource(dev, i);
 			}
 		}
 	}
Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c	2005-11-02 14:35:52.800111285 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c	2005-11-02 14:36:51.793839877 -0600
@@ -112,28 +112,6 @@
         return NULL;
 }
 
-static void rpadlpar_claim_one_bus(struct pci_bus *b)
-{
-	struct list_head *ld;
-	struct pci_bus *child_bus;
-
-	for (ld = b->devices.next; ld != &b->devices; ld = ld->next) {
-		struct pci_dev *dev = pci_dev_b(ld);
-		int i;
-
-		for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-			struct resource *r = &dev->resource[i];
-
-			if (r->parent || !r->start || !r->flags)
-				continue;
-			rpaphp_claim_resource(dev, i);
-		}
-	}
-
-	list_for_each_entry(child_bus, &b->children, node)
-		rpadlpar_claim_one_bus(child_bus);
-}
-
 static int pci_add_secondary_bus(struct device_node *dn,
 		struct pci_dev *bridge_dev)
 {
@@ -158,7 +136,7 @@
 	pcibios_fixup_bus(child);
 
 	/* Claim new bus resources */
-	rpadlpar_claim_one_bus(bridge_dev->bus);
+	pcibios_claim_one_bus(bridge_dev->bus);
 
 	if (hose->last_busno < child->number)
 		hose->last_busno = child->number;
Index: linux-2.6.14-git3/arch/ppc64/kernel/pci.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/pci.c	2005-11-02 14:28:57.119385510 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/pci.c	2005-11-02 14:36:51.808837774 -0600
@@ -197,7 +197,7 @@
 	spin_unlock(&hose_spinlock);
 }
 
-static void __init pcibios_claim_one_bus(struct pci_bus *b)
+void __devinit pcibios_claim_one_bus(struct pci_bus *b)
 {
 	struct pci_dev *dev;
 	struct pci_bus *child_bus;
Index: linux-2.6.14-git3/include/asm-ppc64/pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/pci.h	2005-11-02 14:28:57.119385510 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/pci.h	2005-11-02 14:36:51.813837073 -0600
@@ -160,6 +160,8 @@
 extern void
 pcibios_fixup_device_resources(struct pci_dev *dev, struct pci_bus *bus);
 
+extern void pcibios_claim_one_bus(struct pci_bus *b);
+
 extern struct pci_controller *init_phb_dynamic(struct device_node *dn);
 
 extern int pci_read_irq_line(struct pci_dev *dev);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 23/42]: ppc64: migrate common PCI hotplug code
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (21 preceding siblings ...)
  2005-11-04  0:52 ` [PATCH 22/42]: PCI: remove duplicted pci " Linas Vepstas
@ 2005-11-04  0:52 ` Linas Vepstas
  2005-11-04  0:52 ` [PATCH 24/42]: ppc64: PCI Error Recovery: PPC64 core recovery routines Linas Vepstas
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:52 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

23-rpaphp-migrate.patch

This patch moves some pci device add & remove code from the PCI
hotplug directory to the arch/ppc64/kernel directory, and cleans 
it up a tad. The primary reason for this is that the code performs
some fairly generic operations that are shared with the PCI error
recovery code (living in the arch/ppc64/kernel directory).

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/pci_dlpar.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/pci_dlpar.c	2005-11-02 14:39:24.724396565 -0600
@@ -0,0 +1,174 @@
+/*
+ * PCI Dynamic LPAR, PCI Hot Plug and PCI EEH recovery code
+ * for RPA-compliant PPC64 platform.
+ * Copyright (C) 2003 Linda Xie <lxie@us.ibm.com>
+ * Copyright (C) 2005 International Business Machines
+ *
+ * Updates, 2005, John Rose <johnrose@austin.ibm.com>
+ * Updates, 2005, Linas Vepstas <linas@austin.ibm.com>
+ *
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/pci.h>
+#include <asm/pci-bridge.h>
+
+static struct pci_bus *
+find_bus_among_children(struct pci_bus *bus,
+                        struct device_node *dn)
+{
+	struct pci_bus *child = NULL;
+	struct list_head *tmp;
+	struct device_node *busdn;
+
+	busdn = pci_bus_to_OF_node(bus);
+	if (busdn == dn)
+		return bus;
+
+	list_for_each(tmp, &bus->children) {
+		child = find_bus_among_children(pci_bus_b(tmp), dn);
+		if (child)
+			break;
+	};
+	return child;
+}
+
+struct pci_bus *
+pcibios_find_pci_bus(struct device_node *dn)
+{
+	struct pci_dn *pdn = dn->data;
+
+	if (!pdn  || !pdn->phb || !pdn->phb->bus)
+		return NULL;
+
+	return find_bus_among_children(pdn->phb->bus, dn);
+}
+
+/**
+ * pcibios_remove_pci_devices - remove all devices under this bus
+ *
+ * Remove all of the PCI devices under this bus both from the
+ * linux pci device tree, and from the ppc64 EEH address cache.
+ */
+void
+pcibios_remove_pci_devices(struct pci_bus *bus)
+{
+	struct pci_dev *dev, *tmp;
+
+	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+		eeh_remove_bus_device(dev);
+		pci_remove_bus_device(dev);
+	}
+}
+
+/* Must be called before pci_bus_add_devices */
+static void
+pcibios_fixup_new_pci_devices(struct pci_bus *bus, int fix_bus)
+{
+	struct pci_dev *dev;
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		/*
+		 * Skip already-present devices (which are on the
+		 * global device list.)
+		 */
+		if (list_empty(&dev->global_list)) {
+			int i;
+
+			/* Need to setup IOMMU tables */
+			ppc_md.iommu_dev_setup(dev);
+
+			if(fix_bus)
+				pcibios_fixup_device_resources(dev, bus);
+			pci_read_irq_line(dev);
+			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+				struct resource *r = &dev->resource[i];
+
+				if (r->parent || !r->start || !r->flags)
+					continue;
+				pci_claim_resource(dev, i);
+			}
+		}
+	}
+}
+
+static int
+pcibios_pci_config_bridge(struct pci_dev *dev)
+{
+	u8 sec_busno;
+	struct pci_bus *child_bus;
+	struct pci_dev *child_dev;
+
+	/* Get busno of downstream bus */
+	pci_read_config_byte(dev, PCI_SECONDARY_BUS, &sec_busno);
+
+	/* Add to children of PCI bridge dev->bus */
+	child_bus = pci_add_new_bus(dev->bus, dev, sec_busno);
+	if (!child_bus) {
+		printk (KERN_ERR "%s: could not add second bus\n", __FUNCTION__);
+		return -EIO;
+	}
+	sprintf(child_bus->name, "PCI Bus #%02x", child_bus->number);
+
+	pci_scan_child_bus(child_bus);
+
+	list_for_each_entry(child_dev, &child_bus->devices, bus_list) {
+		eeh_add_device_late(child_dev);
+	}
+
+	/* Fixup new pci devices without touching bus struct */
+	pcibios_fixup_new_pci_devices(child_bus, 0);
+
+	/* Make the discovered devices available */
+	pci_bus_add_devices(child_bus);
+	return 0;
+}
+
+/**
+ * pcibios_add_pci_devices - adds new pci devices to bus
+ *
+ * This routine will find and fixup new pci devices under
+ * the indicated bus. This routine presumes that there
+ * might already be some devices under this bridge, so
+ * it carefully tries to add only new devices.  (And that
+ * is how this routine differs from other, similar pcibios
+ * routines.)
+ */
+void
+pcibios_add_pci_devices(struct pci_bus * bus)
+{
+	int slotno, num;
+	struct pci_dev *dev;
+	struct device_node *dn = pci_bus_to_OF_node(bus);
+
+	eeh_add_device_tree_early(dn);
+
+	/* pci_scan_slot should find all children */
+	slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
+	num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
+	if (num) {
+		pcibios_fixup_new_pci_devices(bus, 1);
+		pci_bus_add_devices(bus);
+	}
+
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		eeh_add_device_late (dev);
+		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+			pcibios_pci_config_bridge(dev);
+	}
+}
Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:36:51.785840999 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_pci.c	2005-11-02 14:39:24.730395724 -0600
@@ -32,36 +32,6 @@
 #include "../pci.h"		/* for pci_add_new_bus */
 #include "rpaphp.h"
 
-static struct pci_bus *find_bus_among_children(struct pci_bus *bus,
-					struct device_node *dn)
-{
-	struct pci_bus *child = NULL;
-	struct list_head *tmp;
-	struct device_node *busdn;
-
-	busdn = pci_bus_to_OF_node(bus);
-	if (busdn == dn)
-		return bus;
-
-	list_for_each(tmp, &bus->children) {
-		child = find_bus_among_children(pci_bus_b(tmp), dn);
-		if (child)
-			break;
-	}
-	return child;
-}
-
-struct pci_bus *rpaphp_find_pci_bus(struct device_node *dn)
-{
-	struct pci_dn *pdn = dn->data;
-
-	if (!pdn  || !pdn->phb || !pdn->phb->bus)
-		return NULL;
-
-	return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(rpaphp_find_pci_bus);
-
 static int rpaphp_get_sensor_state(struct slot *slot, int *state)
 {
 	int rc;
@@ -120,7 +90,7 @@
 			/* config/unconfig adapter */
 			*value = slot->state;
 		} else {
-			bus = rpaphp_find_pci_bus(slot->dn);
+			bus = pcibios_find_pci_bus(slot->dn);
 			if (bus && !list_empty(&bus->devices))
 				*value = CONFIGURED;
 			else
@@ -131,117 +101,6 @@
 	return rc;
 }
 
-/* Must be called before pci_bus_add_devices */
-static void 
-rpaphp_fixup_new_pci_devices(struct pci_bus *bus, int fix_bus)
-{
-	struct pci_dev *dev;
-
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		/*
-		 * Skip already-present devices (which are on the
-		 * global device list.)
-		 */
-		if (list_empty(&dev->global_list)) {
-			int i;
-			
-			/* Need to setup IOMMU tables */
-			ppc_md.iommu_dev_setup(dev);
-
-			if(fix_bus)
-				pcibios_fixup_device_resources(dev, bus);
-			pci_read_irq_line(dev);
-			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-				struct resource *r = &dev->resource[i];
-
-				if (r->parent || !r->start || !r->flags)
-					continue;
-				pci_claim_resource(dev, i);
-			}
-		}
-	}
-}
-
-static int rpaphp_pci_config_bridge(struct pci_dev *dev)
-{
-	u8 sec_busno;
-	struct pci_bus *child_bus;
-	struct pci_dev *child_dev;
-
-	dbg("Enter %s:  BRIDGE dev=%s\n", __FUNCTION__, pci_name(dev));
-
-	/* get busno of downstream bus */
-	pci_read_config_byte(dev, PCI_SECONDARY_BUS, &sec_busno);
-		
-	/* add to children of PCI bridge dev->bus */
-	child_bus = pci_add_new_bus(dev->bus, dev, sec_busno);
-	if (!child_bus) {
-		err("%s: could not add second bus\n", __FUNCTION__);
-		return -EIO;
-	}
-	sprintf(child_bus->name, "PCI Bus #%02x", child_bus->number);
-	/* do pci_scan_child_bus */
-	pci_scan_child_bus(child_bus);
-
-	list_for_each_entry(child_dev, &child_bus->devices, bus_list) {
-		eeh_add_device_late(child_dev);
-	}
-
-	 /* fixup new pci devices without touching bus struct */
-	rpaphp_fixup_new_pci_devices(child_bus, 0);
-
-	/* Make the discovered devices available */
-	pci_bus_add_devices(child_bus);
-	return 0;
-}
-
-/*****************************************************************************
- rpaphp_pci_config_slot() will  configure all devices under the
- given slot->dn and return the the first pci_dev.
- *****************************************************************************/
-int
-rpaphp_config_pci_adapter(struct pci_bus *bus)
-{
-	struct device_node *dn = pci_bus_to_OF_node(bus);
-	struct pci_dev *dev = NULL;
-	int rc = -ENODEV;
-	int slotno;
-	int num;
-
-	dbg("Enter %s: dn=%s bus=%s\n", __FUNCTION__, dn->full_name, bus->name);
-	if (!dn || !dn->child)
-		goto exit;
-
-	eeh_add_device_tree_early(dn);
-	
-	slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
-
-	/* pci_scan_slot should find all children */
-	num = pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
-	if (num) {
-		rpaphp_fixup_new_pci_devices(bus, 1);
-		pci_bus_add_devices(bus);
-	}
-	if (list_empty(&bus->devices)) {
-		err("%s: No new device found\n", __FUNCTION__);
-		goto exit;
-	}
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
-			rpaphp_pci_config_bridge(dev);
-	}
-
-	dbg("%s: pci_devs of slot[%s]\n", __FUNCTION__, dn->full_name);
-	list_for_each_entry (dev, &bus->devices, bus_list)
-		dbg("\t%s\n", pci_name(dev));
-
-	rc = 0;
-exit:
-	dbg("Exit %s:  rc=%d\n", __FUNCTION__, rc);
-	return rc;
-}
-EXPORT_SYMBOL_GPL(rpaphp_config_pci_adapter);
-
 static void print_slot_pci_funcs(struct pci_bus *bus)
 {
 	struct device_node *dn;
@@ -257,17 +116,6 @@
 	return;
 }
 
-int rpaphp_unconfig_pci_adapter(struct pci_bus *bus)
-{
-	struct pci_dev *dev, *tmp;
-
-	list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
-		eeh_remove_bus_device(dev);
-		pci_remove_bus_device(dev);
-	}
-	return 0;
-}
-
 static int setup_pci_hotplug_slot_info(struct slot *slot)
 {
 	dbg("%s Initilize the PCI slot's hotplug->info structure ...\n",
@@ -303,7 +151,7 @@
 	struct pci_bus *bus;
 
 	BUG_ON(!dn);
-	bus = rpaphp_find_pci_bus(dn);
+	bus = pcibios_find_pci_bus(dn);
 	if (!bus) {
 		err("%s: no pci_bus for dn %s\n", __FUNCTION__, dn->full_name);
 		goto exit_rc;
@@ -328,10 +176,7 @@
 		if (slot->hotplug_slot->info->adapter_status == NOT_CONFIGURED) {
 			dbg("%s CONFIGURING pci adapter in slot[%s]\n",  
 				__FUNCTION__, slot->name);
-			if (rpaphp_config_pci_adapter(slot->bus)) {
-				err("%s: CONFIG pci adapter failed\n", __FUNCTION__);
-				goto exit_rc;		
-			}
+			pcibios_add_pci_devices(slot->bus);
 
 		} else if (slot->hotplug_slot->info->adapter_status != CONFIGURED) {
 			err("%s: slot[%s]'s adapter_status is NOT_VALID.\n",
@@ -377,16 +222,10 @@
 	/* if slot is not empty, enable the adapter */
 	if (state == PRESENT) {
 		dbg("%s : slot[%s] is occupied.\n", __FUNCTION__, slot->name);
-		retval = rpaphp_config_pci_adapter(slot->bus);
-		if (!retval) {
-			slot->state = CONFIGURED;
-			dbg("%s: PCI devices in slot[%s] has been configured\n", 
+		pcibios_add_pci_devices(slot->bus);
+		slot->state = CONFIGURED;
+		dbg("%s: PCI devices in slot[%s] has been configured\n", 
 				__FUNCTION__, slot->name);
-		} else {
-			slot->state = NOT_CONFIGURED;
-			dbg("%s: no pci_dev struct for adapter in slot[%s]\n",
-			    __FUNCTION__, slot->name);
-		}
 	} else if (state == EMPTY) {
 		dbg("%s : slot[%s] is empty\n", __FUNCTION__, slot->name);
 		slot->state = EMPTY;
Index: linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpadlpar_core.c	2005-11-02 14:36:51.793839877 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpadlpar_core.c	2005-11-02 14:39:24.737394743 -0600
@@ -197,9 +197,8 @@
 static int dlpar_add_pci_slot(char *drc_name, struct device_node *dn)
 {
 	struct pci_dev *dev;
-	int rc;
 
-	if (rpaphp_find_pci_bus(dn))
+	if (pcibios_find_pci_bus(dn))
 		return -EINVAL;
 
 	/* Add pci bus */
@@ -211,12 +210,7 @@
 	}
 
 	if (dn->child) {
-		rc = rpaphp_config_pci_adapter(dev->subordinate);
-		if (rc < 0) {
-			printk(KERN_ERR "%s: unable to enable slot %s\n",
-				__FUNCTION__, drc_name);
-			return -EIO;
-		}
+		pcibios_add_pci_devices(dev->subordinate);
 	}
 
 	/* Add hotplug slot */
@@ -255,7 +249,7 @@
 	struct pci_dn *pdn;
 	int rc = 0;
 
-	if (!rpaphp_find_pci_bus(dn))
+	if (!pcibios_find_pci_bus(dn))
 		return -EINVAL;
 
 	slot = find_slot(dn);
@@ -400,7 +394,7 @@
 	struct pci_bus *bus;
 	struct slot *slot;
 
-	bus = rpaphp_find_pci_bus(dn);
+	bus = pcibios_find_pci_bus(dn);
 	if (!bus)
 		return -EINVAL;
 
Index: linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_core.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/hotplug/rpaphp_core.c	2005-11-02 14:28:55.984544585 -0600
+++ linux-2.6.14-git3/drivers/pci/hotplug/rpaphp_core.c	2005-11-02 14:39:24.744393761 -0600
@@ -426,7 +426,8 @@
 
 	dbg("DISABLING SLOT %s\n", slot->name);
 	down(&rpaphp_sem);
-	retval = rpaphp_unconfig_pci_adapter(slot->bus);
+	pcibios_remove_pci_devices(slot->bus);
+	retval = 0;
 	up(&rpaphp_sem);
 	slot->state = NOT_CONFIGURED;
 	info("%s: devices in slot[%s] unconfigured.\n", __FUNCTION__,
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:32:55.306995693 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:40:05.531674439 -0600
@@ -1,6 +1,6 @@
 obj-y			:= pci.o lpar.o hvCall.o nvram.o reconfig.o \
-			   setup.o iommu.o rtas-fw.o ras.o
+			   setup.o iommu.o rtas-fw.o ras.o pci_dlpar.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_IBMVIO)	+= vio.o
 obj-$(CONFIG_XICS)	+= xics.o
-obj-$(CONFIG_EEH)    += eeh.o eeh_event.o
+obj-$(CONFIG_EEH)    += eeh.o eeh_event.o 
Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h	2005-11-02 14:28:55.984544585 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h	2005-11-02 14:39:24.755392219 -0600
@@ -121,9 +121,18 @@
 		return bus->sysdata; /* Must be root bus (PHB) */
 }
 
+/** Find the bus corresponding to the indicated device node */
+struct pci_bus * pcibios_find_pci_bus(struct device_node *dn);
+
 extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 					 struct device_node *dev, int primary);
 
+/** Remove all of the PCI devices under this bus */
+void pcibios_remove_pci_devices(struct pci_bus *bus);
+
+/** Discover new pci devices under this bus, and add them */
+void pcibios_add_pci_devices(struct pci_bus * bus);
+
 extern int pcibios_remove_root_bus(struct pci_controller *phb);
 
 extern void phbs_remap_io(void);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 24/42]: ppc64: PCI Error Recovery: PPC64 core recovery routines
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (22 preceding siblings ...)
  2005-11-04  0:52 ` [PATCH 23/42]: ppc64: migrate common PCI " Linas Vepstas
@ 2005-11-04  0:52 ` Linas Vepstas
  2005-11-04  0:53 ` [PATCH 25/42]: ppc64: Split out PCI address cache to its own file Linas Vepstas
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:52 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

Various PCI bus errors can be signaled by newer PCI controllers.  The
core error recovery routines are architecture dependent.  This patch adds
a recovery infrastructure for the  PPC64 pSeries systems.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:36:41.255317484 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:41:18.427452474 -0600
@@ -485,6 +485,11 @@
 		if (PCI_DN(dn)) {
 			PCI_DN(dn)->eeh_mode |= mode_flag;
 
+			/* Mark the pci device driver too */
+			struct pci_dev *dev = PCI_DN(dn)->pcidev;
+			if (dev && dev->driver)
+				dev->error_state = pci_channel_io_frozen;
+
 			if (dn->child)
 				__eeh_mark_slot (dn->child, mode_flag);
 		}
@@ -544,6 +549,7 @@
 	int rets[3];
 	unsigned long flags;
 	struct pci_dn *pdn;
+	enum pci_channel_state state;
 	int rc = 0;
 
 	__get_cpu_var(total_mmio_ffs)++;
@@ -648,8 +654,13 @@
 	eeh_mark_slot (dn, EEH_MODE_ISOLATED);
 	spin_unlock_irqrestore(&confirm_error_lock, flags);
 
-	eeh_send_failure_event (dn, dev, rets[0], rets[2]);
-	
+	state = pci_channel_io_normal;
+	if ((rets[0] == 2) || (rets[0] == 4))
+		state = pci_channel_io_frozen;
+	if (rets[0] == 5)
+		state = pci_channel_io_perm_failure;
+	eeh_send_failure_event (dn, dev, state, rets[2]);
+
 	/* Most EEH events are due to device driver bugs.  Having
 	 * a stack trace will help the device-driver authors figure
 	 * out what happened.  So print that out. */
@@ -953,8 +964,10 @@
 	 * But there are a few cases like display devices that make sense.
 	 */
 	enable = 1;	/* i.e. we will do checking */
+#if 0
 	if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY)
 		enable = 0;
+#endif
 
 	if (!enable)
 		pdn->eeh_mode |= EEH_MODE_NOCHECK;
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:41:18.435451353 -0600
@@ -0,0 +1,366 @@
+/*
+ * PCI Error Recovery Driver for RPA-compliant PPC64 platform.
+ * Copyright (C) 2004, 2005 Linas Vepstas <linas@linas.org>
+ *
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Send feedback to <linas@us.ibm.com>
+ *
+ */
+#include <linux/delay.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <asm/eeh.h>
+#include <asm/eeh_event.h>
+#include <asm/ppc-pci.h>
+#include <asm/pci-bridge.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+
+
+static inline const char * pcid_name (struct pci_dev *pdev)
+{
+	if (pdev->dev.driver)
+		return pdev->dev.driver->name;
+	return "";
+}
+
+/**
+ * Return the "partitionable endpoint" (pe) under which this device lies
+ */
+static struct device_node * find_device_pe(struct device_node *dn)
+{
+	while ((dn->parent) && PCI_DN(dn->parent) &&
+	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+		dn = dn->parent;
+	}
+	return dn;
+}
+
+
+#ifdef DEBUG
+static void print_device_node_tree (struct pci_dn *pdn, int dent)
+{
+	int i;
+	if (!pdn) return;
+	for (i=0;i<dent; i++)
+		printk(" ");
+	printk("dn=%s mode=%x \tcfg_addr=%x pe_addr=%x \tfull=%s\n",
+		pdn->node->name, pdn->eeh_mode, pdn->eeh_config_addr,
+		pdn->eeh_pe_config_addr, pdn->node->full_name);
+	dent += 3;
+	struct device_node *pc = pdn->node->child;
+	while (pc) {
+		print_device_node_tree(PCI_DN(pc), dent);
+		pc = pc->sibling;
+	}
+}
+#endif
+
+/** 
+ * irq_in_use - return true if this irq is being used 
+ */
+static int irq_in_use(unsigned int irq)
+{
+	int rc = 0;
+	unsigned long flags;
+   struct irq_desc *desc = irq_desc + irq;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	if (desc->action)
+		rc = 1;
+	spin_unlock_irqrestore(&desc->lock, flags);
+	return rc;
+}
+
+/* ------------------------------------------------------- */
+/** eeh_report_error - report an EEH error to each device,
+ *  collect up and merge the device responses.
+ */
+
+static void eeh_report_error(struct pci_dev *dev, void *userdata)
+{
+	enum pcierr_result rc, *res = userdata;
+	struct pci_driver *driver = dev->driver;
+
+	dev->error_state = pci_channel_io_frozen;
+
+	if (!driver)
+		return;
+
+	if (irq_in_use (dev->irq)) {
+		struct device_node *dn = pci_device_to_OF_node(dev);
+		PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED;
+		disable_irq_nosync(dev->irq);
+	}
+	if (!driver->err_handler)
+		return;
+	if (!driver->err_handler->error_detected)
+		return;
+
+	rc = driver->err_handler->error_detected (dev, pci_channel_io_frozen);
+	if (*res == PCIERR_RESULT_NONE) *res = rc;
+	if (*res == PCIERR_RESULT_NEED_RESET) return;
+	if (*res == PCIERR_RESULT_DISCONNECT &&
+	     rc == PCIERR_RESULT_NEED_RESET) *res = rc;
+}
+
+/** eeh_report_reset -- tell this device that the pci slot
+ *  has been reset.
+ */
+
+static void eeh_report_reset(struct pci_dev *dev, void *userdata)
+{
+	struct pci_driver *driver = dev->driver;
+	struct device_node *dn = pci_device_to_OF_node(dev);
+
+	if (!driver)
+		return;
+
+	if ((PCI_DN(dn)->eeh_mode) & EEH_MODE_IRQ_DISABLED) {
+		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_IRQ_DISABLED;
+		enable_irq(dev->irq);
+	}
+	if (!driver->err_handler)
+		return;
+	if (!driver->err_handler->slot_reset)
+		return;
+
+	driver->err_handler->slot_reset(dev);
+}
+
+static void eeh_report_resume(struct pci_dev *dev, void *userdata)
+{
+	struct pci_driver *driver = dev->driver;
+
+	dev->error_state = pci_channel_io_normal;
+
+	if (!driver)
+		return;
+	if (!driver->err_handler)
+		return;
+	if (!driver->err_handler->resume)
+		return;
+
+	driver->err_handler->resume(dev);
+}
+
+static void eeh_report_failure(struct pci_dev *dev, void *userdata)
+{
+	struct pci_driver *driver = dev->driver;
+
+	dev->error_state = pci_channel_io_perm_failure;
+
+	if (!driver)
+		return;
+
+	if (irq_in_use (dev->irq)) {
+		struct device_node *dn = pci_device_to_OF_node(dev);
+		PCI_DN(dn)->eeh_mode |= EEH_MODE_IRQ_DISABLED;
+		disable_irq_nosync(dev->irq);
+	}
+	if (!driver->err_handler)
+		return;
+	if (!driver->err_handler->error_detected)
+		return;
+	driver->err_handler->error_detected(dev, pci_channel_io_perm_failure);
+}
+
+/* ------------------------------------------------------- */
+/**
+ * handle_eeh_events -- reset a PCI device after hard lockup.
+ *
+ * pSeries systems will isolate a PCI slot if the PCI-Host
+ * bridge detects address or data parity errors, DMA's
+ * occuring to wild addresses (which usually happen due to
+ * bugs in device drivers or in PCI adapter firmware).
+ * Slot isolations also occur if #SERR, #PERR or other misc
+ * PCI-related errors are detected.
+ *
+ * Recovery process consists of unplugging the device driver
+ * (which generated hotplug events to userspace), then issuing
+ * a PCI #RST to the device, then reconfiguring the PCI config
+ * space for all bridges & devices under this slot, and then
+ * finally restarting the device drivers (which cause a second
+ * set of hotplug events to go out to userspace).
+ */
+
+/**
+ * eeh_reset_device() -- perform actual reset of a pci slot
+ * Args: bus: pointer to the pci bus structure corresponding
+ *            to the isolated slot. A non-null value will
+ *            cause all devices under the bus to be removed
+ *            and then re-added.
+ *     pe_dn: pointer to a "Partionable Endpoint" device node.
+ *            This is the top-level structure on which pci
+ *            bus resets can be performed.
+ */
+
+static void eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus)
+{
+	if (bus)
+		pcibios_remove_pci_devices(bus);
+
+	/* Reset the pci controller. (Asserts RST#; resets config space).
+	 * Reconfigure bridges and devices */
+	rtas_set_slot_reset(pe_dn);
+
+	/* Walk over all functions on this device */
+	rtas_configure_bridge(pe_dn);
+	eeh_restore_bars(pe_dn);
+
+	/* Give the system 5 seconds to finish running the user-space
+	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes, 
+	 * this is a hack, but if we don't do this, and try to bring 
+	 * the device up before the scripts have taken it down, 
+	 * potentially weird things happen.
+	 */
+	if (bus) {
+		ssleep (5);
+		pcibios_add_pci_devices(bus);
+	}
+}
+
+/* The longest amount of time to wait for a pci device
+ * to come back on line, in seconds.
+ */
+#define MAX_WAIT_FOR_RECOVERY 15
+
+void handle_eeh_events (struct eeh_event *event)
+{
+	struct device_node *frozen_dn;
+	struct pci_dn *frozen_pdn;
+	struct pci_bus *frozen_bus;
+	int perm_failure = 0;
+
+	frozen_dn = find_device_pe(event->dn);
+	frozen_bus = pcibios_find_pci_bus(frozen_dn);
+
+	if (!frozen_dn) {
+		printk(KERN_ERR "EEH: Error: Cannot find partition endpoint for %s\n",
+		        pci_name(event->dev));
+		return;
+	}
+
+	/* There are two different styles for coming up with the PE.
+	 * In the old style, it was the highest EEH-capable device
+	 * which was always an EADS pci bridge.  In the new style,
+	 * there might not be any EADS bridges, and even when there are,
+	 * the firmware marks them as "EEH incapable". So another
+	 * two-step is needed to find the pci bus.. */
+	if (!frozen_bus)
+		frozen_bus = pcibios_find_pci_bus (frozen_dn->parent);
+
+	if (!frozen_bus) {
+		printk(KERN_ERR "EEH: Cannot find PCI bus for %s\n",
+		        frozen_dn->full_name);
+		return;
+	}
+
+#if 0
+	/* We may get "permanent failure" messages on empty slots.
+	 * These are false alarms. Empty slots have no child dn. */
+	if ((event->state == pci_channel_io_perm_failure) && (frozen_device == NULL))
+		return;
+#endif
+
+	frozen_pdn = PCI_DN(frozen_dn);
+	frozen_pdn->eeh_freeze_count++;
+	
+	if (frozen_pdn->eeh_freeze_count > EEH_MAX_ALLOWED_FREEZES)
+		perm_failure = 1;
+
+	/* If the reset state is a '5' and the time to reset is 0 (infinity)
+	 * or is more then 15 seconds, then mark this as a permanent failure.
+	 */
+	if ((event->state == pci_channel_io_perm_failure) &&
+	    ((event->time_unavail <= 0) ||
+	     (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000)))
+	{
+		perm_failure = 1;
+	}
+
+	/* Log the error with the rtas logger. */
+	if (perm_failure) {
+		/*
+		 * About 90% of all real-life EEH failures in the field
+		 * are due to poorly seated PCI cards. Only 10% or so are
+		 * due to actual, failed cards.
+		 */
+		printk(KERN_ERR
+		   "EEH: PCI device %s - %s has failed %d times \n"
+		   "and has been permanently disabled.  Please try reseating\n"
+		   "this device or replacing it.\n",
+			pci_name (frozen_pdn->pcidev), 
+			pcid_name(frozen_pdn->pcidev), 
+			frozen_pdn->eeh_freeze_count);
+
+		eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */);
+
+		/* Notify all devices that they're about to go down. */
+		pci_walk_bus(frozen_bus, eeh_report_failure, 0);
+
+		/* Shut down the device drivers for good. */
+		pcibios_remove_pci_devices(frozen_bus);
+		return;
+	}
+
+	eeh_slot_error_detail(frozen_pdn, 1 /* Temporary Error */);
+	printk(KERN_WARNING
+	   "EEH: This PCI device has failed %d times since last reboot: %s - %s\n",
+		frozen_pdn->eeh_freeze_count,
+		pci_name (frozen_pdn->pcidev), 
+		pcid_name(frozen_pdn->pcidev));
+
+	/* Walk the various device drivers attached to this slot through
+	 * a reset sequence, giving each an opportunity to do what it needs
+	 * to accomplish the reset.  Each child gets a report of the
+	 * status ... if any child can't handle the reset, then the entire
+	 * slot is dlpar removed and added.
+	 */
+	enum pcierr_result result = PCIERR_RESULT_NONE;
+	pci_walk_bus(frozen_bus, eeh_report_error, &result);
+
+	/* If all device drivers were EEH-unaware, then shut
+	 * down all of the device drivers, and hope they
+	 * go down willingly, without panicing the system.
+	 */
+	if (result == PCIERR_RESULT_NONE) {
+		eeh_reset_device(frozen_pdn, frozen_bus);
+	}
+
+	/* If any device called out for a reset, then reset the slot */
+	if (result == PCIERR_RESULT_NEED_RESET) {
+		eeh_reset_device(frozen_pdn, NULL);
+		pci_walk_bus(frozen_bus, eeh_report_reset, 0);
+	}
+
+	/* If all devices reported they can proceed, the re-enable PIO */
+	if (result == PCIERR_RESULT_CAN_RECOVER) {
+		/* XXX Not supported; we brute-force reset the device */
+		eeh_reset_device(frozen_pdn, NULL);
+		pci_walk_bus(frozen_bus, eeh_report_reset, 0);
+	}
+
+	/* Tell all device drivers that they can resume operations */
+	pci_walk_bus(frozen_bus, eeh_report_resume, 0);
+}
+
+/* ---------- end of file ---------- */
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_event.c	2005-11-02 14:32:35.731739983 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_event.c	2005-11-02 14:41:18.440450652 -0600
@@ -21,6 +21,7 @@
 #include <linux/list.h>
 #include <linux/pci.h>
 #include <asm/eeh_event.h>
+#include <asm/ppc-pci.h>
 
 /** Overview:
  *  EEH error states may be detected within exception handlers;
@@ -37,31 +38,6 @@
 DECLARE_WORK(eeh_event_wq, eeh_thread_launcher, NULL);
 
 /**
- * eeh_panic - call panic() for an eeh event that cannot be handled.
- * The philosophy of this routine is that it is better to panic and
- * halt the OS than it is to risk possible data corruption by
- * oblivious device drivers that don't know better.
- *
- * @dev pci device that had an eeh event
- * @reset_state current reset state of the device slot
- */
-static void eeh_panic(struct pci_dev *dev, int reset_state)
-{
-	/*
-	 * Since the panic_on_oops sysctl is used to halt the system
-	 * in light of potential corruption, we can use it here.
-	 */
-	if (panic_on_oops) {
-		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
-		      pci_name(dev));
-	}
-	else {
-		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
-		       reset_state, pci_name(dev));
-	}
-}
-
-/**
  * eeh_event_handler - dispatch EEH events.  The detection of a frozen
  * slot can occur inside an interrupt, where it can be hard to do
  * anything about it.  The goal of this routine is to pull these
@@ -82,10 +58,16 @@
 
 		spin_lock_irqsave(&eeh_eventlist_lock, flags);
 		event = NULL;
+
+		/* Unqueue the event, get ready to process. */
 		if (!list_empty(&eeh_eventlist)) {
 			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
 			list_del(&event->list);
 		}
+		
+		if (event)
+			eeh_mark_slot(event->dn, EEH_MODE_RECOVERING);
+
 		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
 		if (event == NULL)
 			break;
@@ -93,8 +75,11 @@
 		printk(KERN_INFO "EEH: Detected PCI bus error on device %s\n",
 		       pci_name(event->dev));
 
-		eeh_panic (event->dev, event->state);
+		handle_eeh_events(event);
+
+		eeh_clear_slot(event->dn, EEH_MODE_RECOVERING);
 
+		pci_dev_put(event->dev);
 		kfree(event);
 	}
 
@@ -122,7 +107,7 @@
  */
 int eeh_send_failure_event (struct device_node *dn,
                             struct pci_dev *dev,
-                            int state,
+                            enum pci_channel_state state,
                             int time_unavail)
 {
 	unsigned long flags;
Index: linux-2.6.14-git3/include/asm-powerpc/eeh_event.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/eeh_event.h	2005-11-02 14:32:35.718741805 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/eeh_event.h	2005-11-02 14:41:18.444450091 -0600
@@ -29,7 +29,7 @@
 	struct list_head     list;
 	struct device_node 	*dn;   /* struct device node */
 	struct pci_dev       *dev;  /* affected device */
-	int                  state;
+	enum pci_channel_state state; /* PCI bus state for the affected device */
 	int time_unavail;    /* milliseconds until device might be available */
 };
 
@@ -46,7 +46,10 @@
  */
 int eeh_send_failure_event (struct device_node *dn,
                             struct pci_dev *dev,
-                            int reset_state,
+                            enum pci_channel_state state,
                             int time_unavail);
 
+/* Main recovery function */
+void handle_eeh_events (struct eeh_event *);
+
 #endif /* ASM_PPC64_EEH_EVENT_H */
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:40:05.531674439 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:41:48.393250352 -0600
@@ -3,4 +3,4 @@
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_IBMVIO)	+= vio.o
 obj-$(CONFIG_XICS)	+= xics.o
-obj-$(CONFIG_EEH)    += eeh.o eeh_event.o 
+obj-$(CONFIG_EEH)    += eeh.o eeh_driver.o eeh_event.o
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:35:39.295004776 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:41:18.454448689 -0600
@@ -54,6 +54,15 @@
 /* ---- EEH internal-use-only related routines ---- */
 #ifdef CONFIG_EEH
 /**
+ * eeh_slot_error_detail -- record and EEH error condition to the log
+ * @severity: 1 if temporary, 2 if permanent failure.
+ *
+ * Obtains the the EEH error details from the RTAS subsystem,
+ * and then logs these details with the RTAS error log system.
+ */
+void eeh_slot_error_detail (struct pci_dn *pdn, int severity);
+
+/**
  * rtas_set_slot_reset -- unfreeze a frozen slot
  *
  * Clear the EEH-frozen condition on a slot.  This routine
Index: linux-2.6.14-git3/include/asm-ppc64/eeh.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/eeh.h	2005-11-02 14:36:41.263316362 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/eeh.h	2005-11-02 14:41:18.461447707 -0600
@@ -31,9 +31,11 @@
 #ifdef CONFIG_EEH
 
 /* Values for eeh_mode bits in device_node */
-#define EEH_MODE_SUPPORTED	(1<<0)
-#define EEH_MODE_NOCHECK	(1<<1)
-#define EEH_MODE_ISOLATED	(1<<2)
+#define EEH_MODE_SUPPORTED     (1<<0)
+#define EEH_MODE_NOCHECK       (1<<1)
+#define EEH_MODE_ISOLATED      (1<<2)
+#define EEH_MODE_RECOVERING    (1<<3)
+#define EEH_MODE_IRQ_DISABLED  (1<<4)
 
 /* Max number of EEH freezes allowed before we consider the device
  * to be permanently disabled. */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 25/42]: ppc64: Split out PCI address cache to its own file
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (23 preceding siblings ...)
  2005-11-04  0:52 ` [PATCH 24/42]: ppc64: PCI Error Recovery: PPC64 core recovery routines Linas Vepstas
@ 2005-11-04  0:53 ` Linas Vepstas
  2005-11-04  0:53 ` [PATCH 26/42]: ppc64: Add "partion endpoint" support Linas Vepstas
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:53 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

25-pci-address-cache.patch

The core EEH files is rather large. This patch splits out a self-contained
chunk of it into its own file.  This is the chunk that performes the 
caching and lookup of pci devices based on the i/o addresses of thier
resoures.  This code is almos archiecture-independent and could be 
used by any system that wanted to find a pci device based only on 
the i/o address used by the device.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:41:48.393250352 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:42:58.323443756 -0600
@@ -3,4 +3,4 @@
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_IBMVIO)	+= vio.o
 obj-$(CONFIG_XICS)	+= xics.o
-obj-$(CONFIG_EEH)    += eeh.o eeh_driver.o eeh_event.o
+obj-$(CONFIG_EEH)    += eeh.o eeh_cache.o eeh_driver.o eeh_event.o
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:41:18.427452474 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:42:38.986155538 -0600
@@ -77,9 +77,6 @@
  */
 #define EEH_MAX_FAILS	100000
 
-/* Misc forward declaraions */
-static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn);
-
 /* RTAS tokens */
 static int ibm_set_eeh_option;
 static int ibm_set_slot_reset;
@@ -107,296 +104,8 @@
 static DEFINE_PER_CPU(unsigned long, ignored_failures);
 static DEFINE_PER_CPU(unsigned long, slot_resets);
 
-/**
- * The pci address cache subsystem.  This subsystem places
- * PCI device address resources into a red-black tree, sorted
- * according to the address range, so that given only an i/o
- * address, the corresponding PCI device can be **quickly**
- * found. It is safe to perform an address lookup in an interrupt
- * context; this ability is an important feature.
- *
- * Currently, the only customer of this code is the EEH subsystem;
- * thus, this code has been somewhat tailored to suit EEH better.
- * In particular, the cache does *not* hold the addresses of devices
- * for which EEH is not enabled.
- *
- * (Implementation Note: The RB tree seems to be better/faster
- * than any hash algo I could think of for this problem, even
- * with the penalty of slow pointer chases for d-cache misses).
- */
-struct pci_io_addr_range
-{
-	struct rb_node rb_node;
-	unsigned long addr_lo;
-	unsigned long addr_hi;
-	struct pci_dev *pcidev;
-	unsigned int flags;
-};
-
-static struct pci_io_addr_cache
-{
-	struct rb_root rb_root;
-	spinlock_t piar_lock;
-} pci_io_addr_cache_root;
-
-static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr)
-{
-	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
-
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (addr < piar->addr_lo) {
-			n = n->rb_left;
-		} else {
-			if (addr > piar->addr_hi) {
-				n = n->rb_right;
-			} else {
-				pci_dev_get(piar->pcidev);
-				return piar->pcidev;
-			}
-		}
-	}
-
-	return NULL;
-}
-
-/**
- * pci_get_device_by_addr - Get device, given only address
- * @addr: mmio (PIO) phys address or i/o port number
- *
- * Given an mmio phys address, or a port number, find a pci device
- * that implements this address.  Be sure to pci_dev_put the device
- * when finished.  I/O port numbers are assumed to be offset
- * from zero (that is, they do *not* have pci_io_addr added in).
- * It is safe to call this function within an interrupt.
- */
-static struct pci_dev *pci_get_device_by_addr(unsigned long addr)
-{
-	struct pci_dev *dev;
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	dev = __pci_get_device_by_addr(addr);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-	return dev;
-}
-
-#ifdef DEBUG
-/*
- * Handy-dandy debug print routine, does nothing more
- * than print out the contents of our addr cache.
- */
-static void pci_addr_cache_print(struct pci_io_addr_cache *cache)
-{
-	struct rb_node *n;
-	int cnt = 0;
-
-	n = rb_first(&cache->rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-		printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n",
-		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
-		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
-		cnt++;
-		n = rb_next(n);
-	}
-}
-#endif
-
-/* Insert address range into the rb tree. */
-static struct pci_io_addr_range *
-pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
-		      unsigned long ahi, unsigned int flags)
-{
-	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct pci_io_addr_range *piar;
-
-	/* Walk tree, find a place to insert into tree */
-	while (*p) {
-		parent = *p;
-		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
-		if (ahi < piar->addr_lo) {
-			p = &parent->rb_left;
-		} else if (alo > piar->addr_hi) {
-			p = &parent->rb_right;
-		} else {
-			if (dev != piar->pcidev ||
-			    alo != piar->addr_lo || ahi != piar->addr_hi) {
-				printk(KERN_WARNING "PIAR: overlapping address range\n");
-			}
-			return piar;
-		}
-	}
-	piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
-	if (!piar)
-		return NULL;
-
-	piar->addr_lo = alo;
-	piar->addr_hi = ahi;
-	piar->pcidev = dev;
-	piar->flags = flags;
-
-#ifdef DEBUG
-	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
-	                  alo, ahi, pci_name (dev));
-#endif
-
-	rb_link_node(&piar->rb_node, parent, p);
-	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
-
-	return piar;
-}
-
-static void __pci_addr_cache_insert_device(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	struct pci_dn *pdn;
-	int i;
-	int inserted = 0;
-
-	dn = pci_device_to_OF_node(dev);
-	if (!dn) {
-		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
-		return;
-	}
-
-	/* Skip any devices for which EEH is not enabled. */
-	pdn = PCI_DN(dn);
-	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
-	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
-#ifdef DEBUG
-		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
-		       pci_name(dev), pdn->node->full_name);
-#endif
-		return;
-	}
-
-	/* The cache holds a reference to the device... */
-	pci_dev_get(dev);
-
-	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
-		unsigned long start = pci_resource_start(dev,i);
-		unsigned long end = pci_resource_end(dev,i);
-		unsigned int flags = pci_resource_flags(dev,i);
-
-		/* We are interested only bus addresses, not dma or other stuff */
-		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
-			 continue;
-		pci_addr_cache_insert(dev, start, end, flags);
-		inserted = 1;
-	}
-
-	/* If there was nothing to add, the cache has no reference... */
-	if (!inserted)
-		pci_dev_put(dev);
-}
-
-/**
- * pci_addr_cache_insert_device - Add a device to the address cache
- * @dev: PCI device whose I/O addresses we are interested in.
- *
- * In order to support the fast lookup of devices based on addresses,
- * we maintain a cache of devices that can be quickly searched.
- * This routine adds a device to that cache.
- */
-static void pci_addr_cache_insert_device(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__pci_addr_cache_insert_device(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-static inline void __pci_addr_cache_remove_device(struct pci_dev *dev)
-{
-	struct rb_node *n;
-	int removed = 0;
-
-restart:
-	n = rb_first(&pci_io_addr_cache_root.rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (piar->pcidev == dev) {
-			rb_erase(n, &pci_io_addr_cache_root.rb_root);
-			removed = 1;
-			kfree(piar);
-			goto restart;
-		}
-		n = rb_next(n);
-	}
-
-	/* The cache no longer holds its reference to this device... */
-	if (removed)
-		pci_dev_put(dev);
-}
-
-/**
- * pci_addr_cache_remove_device - remove pci device from addr cache
- * @dev: device to remove
- *
- * Remove a device from the addr-cache tree.
- * This is potentially expensive, since it will walk
- * the tree multiple times (once per resource).
- * But so what; device removal doesn't need to be that fast.
- */
-static void pci_addr_cache_remove_device(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__pci_addr_cache_remove_device(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-/**
- * pci_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scaned for devices (after all device resources are known).
- */
-void __init pci_addr_cache_build(void)
-{
-	struct device_node *dn;
-	struct pci_dev *dev = NULL;
-
-	if (!eeh_subsystem_enabled)
-		return;
-
-	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
-
-	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
-		/* Ignore PCI bridges ( XXX why ??) */
-		if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) {
-			continue;
-		}
-		pci_addr_cache_insert_device(dev);
-
-		/* Save the BAR's; firmware doesn't restore these after EEH reset */
-		dn = pci_device_to_OF_node(dev);
-		eeh_save_bars(dev, PCI_DN(dn));
-	}
-
-#ifdef DEBUG
-	/* Verify tree built up above, echo back the list of addrs. */
-	pci_addr_cache_print(&pci_io_addr_cache_root);
-#endif
-}
-
 /* --------------------------------------------------------------- */
-/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
+/* Below lies the EEH event infrastructure */
 
 void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
 {
@@ -880,7 +589,7 @@
  * PCI devices are added individuallly; but, for the restore,
  * an entire slot is reset at a time.
  */
-static void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn)
+void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn)
 {
 	int i;
 
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-02 14:42:38.994154417 -0600
@@ -0,0 +1,317 @@
+/*
+ * eeh_cache.c
+ * PCI address cache; allows the lookup of PCI devices based on I/O address
+ *
+ * Copyright (C) 2004 Linas Vepstas <linas@austin.ibm.com> IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/list.h>
+#include <linux/pci.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+#include <asm/pci-bridge.h>
+#include <asm/ppc-pci.h>
+#include <asm/systemcfg.h>
+
+#undef DEBUG
+
+/**
+ * The pci address cache subsystem.  This subsystem places
+ * PCI device address resources into a red-black tree, sorted
+ * according to the address range, so that given only an i/o
+ * address, the corresponding PCI device can be **quickly**
+ * found. It is safe to perform an address lookup in an interrupt
+ * context; this ability is an important feature.
+ *
+ * Currently, the only customer of this code is the EEH subsystem;
+ * thus, this code has been somewhat tailored to suit EEH better.
+ * In particular, the cache does *not* hold the addresses of devices
+ * for which EEH is not enabled.
+ *
+ * (Implementation Note: The RB tree seems to be better/faster
+ * than any hash algo I could think of for this problem, even
+ * with the penalty of slow pointer chases for d-cache misses).
+ */
+struct pci_io_addr_range
+{
+	struct rb_node rb_node;
+	unsigned long addr_lo;
+	unsigned long addr_hi;
+	struct pci_dev *pcidev;
+	unsigned int flags;
+};
+
+static struct pci_io_addr_cache
+{
+	struct rb_root rb_root;
+	spinlock_t piar_lock;
+} pci_io_addr_cache_root;
+
+static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr)
+{
+	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
+
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (addr < piar->addr_lo) {
+			n = n->rb_left;
+		} else {
+			if (addr > piar->addr_hi) {
+				n = n->rb_right;
+			} else {
+				pci_dev_get(piar->pcidev);
+				return piar->pcidev;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * pci_get_device_by_addr - Get device, given only address
+ * @addr: mmio (PIO) phys address or i/o port number
+ *
+ * Given an mmio phys address, or a port number, find a pci device
+ * that implements this address.  Be sure to pci_dev_put the device
+ * when finished.  I/O port numbers are assumed to be offset
+ * from zero (that is, they do *not* have pci_io_addr added in).
+ * It is safe to call this function within an interrupt.
+ */
+struct pci_dev *pci_get_device_by_addr(unsigned long addr)
+{
+	struct pci_dev *dev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	dev = __pci_get_device_by_addr(addr);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+	return dev;
+}
+
+#ifdef DEBUG
+/*
+ * Handy-dandy debug print routine, does nothing more
+ * than print out the contents of our addr cache.
+ */
+static void pci_addr_cache_print(struct pci_io_addr_cache *cache)
+{
+	struct rb_node *n;
+	int cnt = 0;
+
+	n = rb_first(&cache->rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+		printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n",
+		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
+		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
+		cnt++;
+		n = rb_next(n);
+	}
+}
+#endif
+
+/* Insert address range into the rb tree. */
+static struct pci_io_addr_range *
+pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
+		      unsigned long ahi, unsigned int flags)
+{
+	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct pci_io_addr_range *piar;
+
+	/* Walk tree, find a place to insert into tree */
+	while (*p) {
+		parent = *p;
+		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
+		if (ahi < piar->addr_lo) {
+			p = &parent->rb_left;
+		} else if (alo > piar->addr_hi) {
+			p = &parent->rb_right;
+		} else {
+			if (dev != piar->pcidev ||
+			    alo != piar->addr_lo || ahi != piar->addr_hi) {
+				printk(KERN_WARNING "PIAR: overlapping address range\n");
+			}
+			return piar;
+		}
+	}
+	piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
+	if (!piar)
+		return NULL;
+
+	piar->addr_lo = alo;
+	piar->addr_hi = ahi;
+	piar->pcidev = dev;
+	piar->flags = flags;
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
+	                  alo, ahi, pci_name (dev));
+#endif
+
+	rb_link_node(&piar->rb_node, parent, p);
+	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
+
+	return piar;
+}
+
+static void __pci_addr_cache_insert_device(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	struct pci_dn *pdn;
+	int i;
+	int inserted = 0;
+
+	dn = pci_device_to_OF_node(dev);
+	if (!dn) {
+		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
+		return;
+	}
+
+	/* Skip any devices for which EEH is not enabled. */
+	pdn = PCI_DN(dn);
+	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
+	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
+#ifdef DEBUG
+		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
+		       pci_name(dev), pdn->node->full_name);
+#endif
+		return;
+	}
+
+	/* The cache holds a reference to the device... */
+	pci_dev_get(dev);
+
+	/* Walk resources on this device, poke them into the tree */
+	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+		unsigned long start = pci_resource_start(dev,i);
+		unsigned long end = pci_resource_end(dev,i);
+		unsigned int flags = pci_resource_flags(dev,i);
+
+		/* We are interested only bus addresses, not dma or other stuff */
+		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+			continue;
+		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
+			 continue;
+		pci_addr_cache_insert(dev, start, end, flags);
+		inserted = 1;
+	}
+
+	/* If there was nothing to add, the cache has no reference... */
+	if (!inserted)
+		pci_dev_put(dev);
+}
+
+/**
+ * pci_addr_cache_insert_device - Add a device to the address cache
+ * @dev: PCI device whose I/O addresses we are interested in.
+ *
+ * In order to support the fast lookup of devices based on addresses,
+ * we maintain a cache of devices that can be quickly searched.
+ * This routine adds a device to that cache.
+ */
+void pci_addr_cache_insert_device(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__pci_addr_cache_insert_device(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+static inline void __pci_addr_cache_remove_device(struct pci_dev *dev)
+{
+	struct rb_node *n;
+	int removed = 0;
+
+restart:
+	n = rb_first(&pci_io_addr_cache_root.rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (piar->pcidev == dev) {
+			rb_erase(n, &pci_io_addr_cache_root.rb_root);
+			removed = 1;
+			kfree(piar);
+			goto restart;
+		}
+		n = rb_next(n);
+	}
+
+	/* The cache no longer holds its reference to this device... */
+	if (removed)
+		pci_dev_put(dev);
+}
+
+/**
+ * pci_addr_cache_remove_device - remove pci device from addr cache
+ * @dev: device to remove
+ *
+ * Remove a device from the addr-cache tree.
+ * This is potentially expensive, since it will walk
+ * the tree multiple times (once per resource).
+ * But so what; device removal doesn't need to be that fast.
+ */
+void pci_addr_cache_remove_device(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__pci_addr_cache_remove_device(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+/**
+ * pci_addr_cache_build - Build a cache of I/O addresses
+ *
+ * Build a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ * This routine scans all pci busses to build the cache.
+ * Must be run late in boot process, after the pci controllers
+ * have been scaned for devices (after all device resources are known).
+ */
+void __init pci_addr_cache_build(void)
+{
+	struct device_node *dn;
+	struct pci_dev *dev = NULL;
+
+	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+		/* Ignore PCI bridges */
+		if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
+			continue;
+
+		pci_addr_cache_insert_device(dev);
+
+		/* Save the BAR's; firmware doesn't restore these after EEH reset */
+		dn = pci_device_to_OF_node(dev);
+		eeh_save_bars(dev, PCI_DN(dn));
+	}
+
+#ifdef DEBUG
+	/* Verify tree built up above, echo back the list of addrs. */
+	pci_addr_cache_print(&pci_io_addr_cache_root);
+#endif
+}
+
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:41:18.454448689 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:42:38.998153856 -0600
@@ -53,6 +53,14 @@
 
 /* ---- EEH internal-use-only related routines ---- */
 #ifdef CONFIG_EEH
+
+void pci_addr_cache_insert_device(struct pci_dev *dev);
+void pci_addr_cache_remove_device(struct pci_dev *dev);
+void pci_addr_cache_build(void);
+struct pci_dev *pci_get_device_by_addr(unsigned long addr);
+
+void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn);
+
 /**
  * eeh_slot_error_detail -- record and EEH error condition to the log
  * @severity: 1 if temporary, 2 if permanent failure.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 26/42]: ppc64: Add "partion endpoint" support
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (24 preceding siblings ...)
  2005-11-04  0:53 ` [PATCH 25/42]: ppc64: Split out PCI address cache to its own file Linas Vepstas
@ 2005-11-04  0:53 ` Linas Vepstas
  2005-11-04  0:53 ` [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver Linas Vepstas
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:53 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

26-eeh-partition-endpoint.patch

New versions of firmware introduce a new method by which the 
"partition endpoint" (the point at which the pci bus is cut). 
This code adds the support for this (mandatory) new feature.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:42:38.986155538 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:43:49.212307192 -0600
@@ -83,6 +83,7 @@
 static int ibm_read_slot_reset_state;
 static int ibm_read_slot_reset_state2;
 static int ibm_slot_error_detail;
+static int ibm_get_config_addr_info;
 
 static int eeh_subsystem_enabled;
 
@@ -457,6 +458,7 @@
 static void
 rtas_pci_slot_reset(struct pci_dn *pdn, int state)
 {
+	int config_addr;
 	int rc;
 
 	BUG_ON (pdn==NULL); 
@@ -467,8 +469,13 @@
 		return;
 	}
 
+	/* Use PE configuration address, if present */
+	config_addr = pdn->eeh_config_addr;
+	if (pdn->eeh_pe_config_addr)
+		config_addr = pdn->eeh_pe_config_addr;
+
 	rc = rtas_call(ibm_set_slot_reset,4,1, NULL,
-	               pdn->eeh_config_addr,
+	               config_addr,
 	               BUID_HI(pdn->phb->buid),
 	               BUID_LO(pdn->phb->buid),
 	               state);
@@ -695,8 +702,22 @@
 			eeh_subsystem_enabled = 1;
 			pdn->eeh_mode |= EEH_MODE_SUPPORTED;
 			pdn->eeh_config_addr = regs[0];
+
+			/* If the newer, better, ibm,get-config-addr-info is supported, 
+			 * then use that instead. */
+			pdn->eeh_pe_config_addr = 0;
+			if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
+				unsigned int rets[2];
+				ret = rtas_call (ibm_get_config_addr_info, 4, 2, rets, 
+					pdn->eeh_config_addr, 
+					info->buid_hi, info->buid_lo,
+					0);
+				if (ret == 0)
+					pdn->eeh_pe_config_addr = rets[0];
+			}
 #ifdef DEBUG
-			printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name);
+			printk(KERN_DEBUG "EEH: %s: eeh enabled, config=%x pe_config=%x\n",
+			       dn->full_name, pdn->eeh_config_addr, pdn->eeh_pe_config_addr);
 #endif
 		} else {
 
@@ -748,6 +769,7 @@
 	ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2");
 	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
 	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
+	ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info");
 
 	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE)
 		return;
Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h	2005-11-02 14:39:24.755392219 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h	2005-11-02 14:43:49.218306351 -0600
@@ -63,6 +63,7 @@
 	int	devfn;			/* for pci devices */
 	int	eeh_mode;		/* See eeh.h for possible EEH_MODEs */
 	int	eeh_config_addr;
+	int	eeh_pe_config_addr; /* new-style partition endpoint address */
 	int 	eeh_check_count;	/* # times driver ignored error */
 	int 	eeh_freeze_count;	/* # times this device froze up. */
 	int	eeh_is_bridge;		/* device is pci-to-pci bridge */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (25 preceding siblings ...)
  2005-11-04  0:53 ` [PATCH 26/42]: ppc64: Add "partion endpoint" support Linas Vepstas
@ 2005-11-04  0:53 ` Linas Vepstas
  2005-11-04  0:53 ` [PATCH 28/42]: SCSI: add PCI error recovery to Symbios " Linas Vepstas
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:53 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

27-pci-error-recovery_IPR-driver.patch

Subject: PCI Error Recovery:  IPR SCSI device driver

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the IPR SCSI device driver.
The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Brian King <brking@us.ibm.com>

--
Index: linux-2.6.14-git3/drivers/scsi/ipr.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/scsi/ipr.c	2005-11-02 14:28:53.284922999 -0600
+++ linux-2.6.14-git3/drivers/scsi/ipr.c	2005-11-02 14:43:52.782806465 -0600
@@ -5328,6 +5328,94 @@
 				shutdown_type);
 }
 
+/* --------------- PCI Error Recovery infrastructure ----------- */
+/** If the PCI slot is frozen, hold off all i/o
+ *  activity; then, as soon as the slot is available again,
+ *  initiate an adapter reset.
+ */
+static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd)
+{
+	/* Disallow new interrupts, avoid loop */
+	ipr_cmd->ioa_cfg->allow_interrupts = 0;
+	list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q);
+	ipr_cmd->done = ipr_reset_ioa_job;
+	return IPR_RC_JOB_RETURN;
+}
+
+/** ipr_eeh_frozen -- called when slot has experience PCI bus error.
+ *  This routine is called to tell us that the PCI bus is down.
+ *  Can't do anything here, except put the device driver into a
+ *  holding pattern, waiting for the PCI bus to come back.
+ */
+static void ipr_eeh_frozen (struct pci_dev *pdev)
+{
+	unsigned long flags = 0;
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+	_ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+}
+
+/** ipr_eeh_slot_reset - called when pci slot has been reset.
+ *
+ * This routine is called by the pci error recovery recovery
+ * code after the PCI slot has been reset, just before we
+ * should resume normal operations.
+ */
+static int ipr_eeh_slot_reset(struct pci_dev *pdev)
+{
+	unsigned long flags = 0;
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+	// pci_enable_device(pdev);
+	// pci_set_master(pdev);
+	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+	_ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space,
+	                                 IPR_SHUTDOWN_NONE);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+
+	return PCIERR_RESULT_RECOVERED;
+}
+
+/** This routine is called when the PCI bus has permanently
+ *  failed.  This routine should purge all pending I/O and
+ *  shut down the device driver (close and unload).
+ */
+static void ipr_eeh_perm_failure(struct pci_dev *pdev)
+{
+	unsigned long flags = 0;
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+	if (ioa_cfg->sdt_state == WAIT_FOR_DUMP)
+		ioa_cfg->sdt_state = ABORT_DUMP;
+	ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES;
+	ioa_cfg->in_ioa_bringdown = 1;
+	ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+}
+
+static int ipr_eeh_error_detected(struct pci_dev *pdev,
+                                enum pci_channel_state state)
+{
+	switch (state) {
+		case pci_channel_io_frozen:
+			ipr_eeh_frozen (pdev);
+			return PCIERR_RESULT_NEED_RESET;
+
+		case pci_channel_io_perm_failure:
+			ipr_eeh_perm_failure (pdev);
+			return PCIERR_RESULT_DISCONNECT;
+			break;
+		default:
+			break;
+	}
+	return PCIERR_RESULT_NEED_RESET;
+}
+
+/* ------------- end of PCI Error Recovery suport ----------- */
+
 /**
  * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..)
  * @ioa_cfg:	ioa cfg struct
@@ -6065,12 +6153,18 @@
 };
 MODULE_DEVICE_TABLE(pci, ipr_pci_table);
 
+static struct pci_error_handlers ipr_err_handler = {
+	.error_detected = ipr_eeh_error_detected,
+	.slot_reset = ipr_eeh_slot_reset,
+};
+
 static struct pci_driver ipr_driver = {
 	.name = IPR_NAME,
 	.id_table = ipr_pci_table,
 	.probe = ipr_probe,
 	.remove = ipr_remove,
 	.shutdown = ipr_shutdown,
+	.err_handler = &ipr_err_handler,
 };
 
 /**

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 28/42]: SCSI: add PCI error recovery to Symbios dev driver
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (26 preceding siblings ...)
  2005-11-04  0:53 ` [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver Linas Vepstas
@ 2005-11-04  0:53 ` Linas Vepstas
  2005-11-04  0:53 ` [PATCH 29/42]: ethernet: add PCI error recovery to e100 " Linas Vepstas
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:53 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the Symbios SCSI device driver.
The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-02 14:28:52.512031337 -0600
+++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-02 14:43:56.084343457 -0600
@@ -686,6 +686,10 @@
 
 	if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("[");
 
+	/* Avoid spinloop trying to handle interrupts on frozen device */
+	if (np->s.io_state != pci_channel_io_normal)
+		return IRQ_HANDLED;
+
 	spin_lock_irqsave(np->s.host->host_lock, flags);
 	sym_interrupt(np);
 	spin_unlock_irqrestore(np->s.host->host_lock, flags);
@@ -759,6 +763,25 @@
  */
 static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); }
 
+static void sym_eeh_timeout(u_long p)
+{
+	struct sym_eh_wait *ep = (struct sym_eh_wait *) p;
+	if (!ep)
+		return;
+	complete(&ep->done);
+}
+
+static void sym_eeh_done(struct sym_eh_wait *ep)
+{
+	if (!ep)
+		return;
+	ep->timed_out = 0;
+	if (!del_timer(&ep->timer))
+		return;
+
+	complete(&ep->done);
+}
+
 /*
  *  Generic method for our eh processing.
  *  The 'op' argument tells what we have to do.
@@ -799,6 +822,35 @@
 
 	/* Try to proceed the operation we have been asked for */
 	sts = -1;
+
+	/* We may be in an error condition because the PCI bus
+	 * went down. In this case, we need to wait until the
+	 * PCI bus is reset, the card is reset, and only then
+	 * proceed with the scsi error recovery.  We'll wait
+	 * for 15 seconds for this to happen.
+	 */
+#define WAIT_FOR_PCI_RECOVERY	15
+	if (np->s.io_state != pci_channel_io_normal) {
+		struct sym_eh_wait eeh, *eep = &eeh;
+		np->s.io_reset_wait = eep;
+		init_completion(&eep->done);
+		init_timer(&eep->timer);
+		eep->to_do = SYM_EH_DO_WAIT;
+		eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ);
+		eep->timer.function = sym_eeh_timeout;
+		eep->timer.data = (u_long)eep;
+		eep->timed_out = 1;	/* Be pessimistic for once :) */
+		add_timer(&eep->timer);
+		spin_unlock_irq(np->s.host->host_lock);
+		wait_for_completion(&eep->done);
+		spin_lock_irq(np->s.host->host_lock);
+		if (eep->timed_out) {
+			printk (KERN_ERR "%s: Timed out waiting for PCI reset\n",
+			       sym_name(np));
+		}
+		np->s.io_reset_wait = NULL;
+	}
+
 	switch(op) {
 	case SYM_EH_ABORT:
 		sts = sym_abort_scsiio(np, cmd, 1);
@@ -1584,6 +1636,8 @@
 	np->maxoffs	= dev->chip.offset_max;
 	np->maxburst	= dev->chip.burst_max;
 	np->myaddr	= dev->host_id;
+	np->s.io_state = pci_channel_io_normal;
+	np->s.io_reset_wait = NULL;
 
 	/*
 	 *  Edit its name.
@@ -1916,6 +1970,58 @@
 	return 1;
 }
 
+/* ------------- PCI Error Recovery infrastructure -------------- */
+/** sym2_io_error_detected() is called when PCI error is detected */
+static int sym2_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state)
+{
+	struct sym_hcb *np = pci_get_drvdata(pdev);
+
+	np->s.io_state = state;
+	// XXX If slot is permanently frozen, then what?
+	// Should we scsi_remove_host() maybe ??
+
+	/* Request a slot slot reset. */
+	return PCIERR_RESULT_NEED_RESET;
+}
+
+/** sym2_io_slot_reset is called when the pci bus has been reset.
+ *  Restart the card from scratch. */
+static int sym2_io_slot_reset (struct pci_dev *pdev)
+{
+	struct sym_hcb *np = pci_get_drvdata(pdev);
+
+	printk (KERN_INFO "%s: recovering from a PCI slot reset\n",
+	    sym_name(np));
+
+	if (pci_enable_device(pdev))
+		printk (KERN_ERR "%s: device setup failed most egregiously\n",
+			    sym_name(np));
+
+	pci_set_master(pdev);
+	enable_irq (pdev->irq);
+
+	/* Perform host reset only on one instance of the card */
+	if (0 == PCI_FUNC (pdev->devfn))
+		sym_reset_scsi_bus(np, 0);
+
+	return PCIERR_RESULT_RECOVERED;
+}
+
+/** sym2_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ */
+static void sym2_io_resume (struct pci_dev *pdev)
+{
+	struct sym_hcb *np = pci_get_drvdata(pdev);
+
+	/* Perform device startup only once for this card. */
+	if (0 == PCI_FUNC (pdev->devfn))
+		sym_start_up (np, 1);
+
+	np->s.io_state = pci_channel_io_normal;
+	sym_eeh_done (np->s.io_reset_wait);
+}
+
 /*
  * Driver host template.
  */
@@ -2169,11 +2275,18 @@
 
 MODULE_DEVICE_TABLE(pci, sym2_id_table);
 
+static struct pci_error_handlers sym2_err_handler = {
+	.error_detected = sym2_io_error_detected,
+	.slot_reset = sym2_io_slot_reset,
+	.resume = sym2_io_resume,
+};
+
 static struct pci_driver sym2_driver = {
 	.name		= NAME53C8XX,
 	.id_table	= sym2_id_table,
 	.probe		= sym2_probe,
 	.remove		= __devexit_p(sym2_remove),
+	.err_handler = &sym2_err_handler,
 };
 
 static int __init sym2_init(void)
Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.h
===================================================================
--- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.h	2005-11-02 14:28:52.513031197 -0600
+++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.h	2005-11-02 14:43:56.089342756 -0600
@@ -181,6 +181,10 @@
 	char		chip_name[8];
 	struct pci_dev	*device;
 
+	/* pci bus i/o state; waiter for clearing of i/o state */
+	enum pci_channel_state io_state;
+	struct sym_eh_wait *io_reset_wait;
+
 	struct Scsi_Host *host;
 
 	void __iomem *	ioaddr;		/* MMIO kernel io address	*/
Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_hipd.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c	2005-11-02 14:28:52.513031197 -0600
+++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_hipd.c	2005-11-02 14:43:56.141335464 -0600
@@ -2809,6 +2809,7 @@
 	u_char	istat, istatc;
 	u_char	dstat;
 	u_short	sist;
+	u_int    icnt;
 
 	/*
 	 *  interrupt on the fly ?
@@ -2850,6 +2851,7 @@
 	sist	= 0;
 	dstat	= 0;
 	istatc	= istat;
+	icnt = 0;
 	do {
 		if (istatc & SIP)
 			sist  |= INW(np, nc_sist);
@@ -2857,6 +2859,19 @@
 			dstat |= INB(np, nc_dstat);
 		istatc = INB(np, nc_istat);
 		istat |= istatc;
+		
+		/* Prevent deadlock waiting on a condition that may never clear. */
+		/* XXX this is a temporary kludge; the correct to detect
+		 * a PCI bus error would be to use the io_check interfaces
+		 * proposed by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
+		 * Problem with polling like that is the state flag might not
+		 * be set.
+		 */
+		icnt ++;
+		if (100 < icnt) {
+			if (np->s.device->error_state != pci_channel_io_normal)
+				return;
+		}
 	} while (istatc & (SIP|DIP));
 
 	if (DEBUG_FLAGS & DEBUG_TINY)

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (27 preceding siblings ...)
  2005-11-04  0:53 ` [PATCH 28/42]: SCSI: add PCI error recovery to Symbios " Linas Vepstas
@ 2005-11-04  0:53 ` Linas Vepstas
  2005-11-04  1:34   ` Jesse Brandeburg
  2005-11-04  0:54 ` [PATCH 30/42]: ethernet: add PCI error recovery to e1000 " Linas Vepstas
                   ` (14 subsequent siblings)
  43 siblings, 1 reply; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:53 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ethernet e100
device driver. The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-git3/drivers/net/e100.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/net/e100.c	2005-11-02 14:28:51.524169808 -0600
+++ linux-2.6.14-git3/drivers/net/e100.c	2005-11-02 14:43:58.890949857 -0600
@@ -2465,6 +2465,75 @@
 }
 
 
+/* ------------------ PCI Error Recovery infrastructure  -------------- */
+/** e100_io_error_detected() is called when PCI error is detected */
+static int e100_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+
+	/* Same as calling e100_down(netdev_priv(netdev)), but generic */
+	netdev->stop(netdev);
+
+	/* Is a detach needed ?? */
+	// netif_device_detach(netdev);
+
+	/* Request a slot reset. */
+	return PCIERR_RESULT_NEED_RESET;
+}
+
+/** e100_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch. */
+static int e100_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	if(pci_enable_device(pdev)) {
+		printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
+		return PCIERR_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	/* Only one device per card can do a reset */
+	if (0 != PCI_FUNC (pdev->devfn))
+		return PCIERR_RESULT_RECOVERED;
+
+	e100_hw_reset(nic);
+	e100_phy_init(nic);
+
+	if(e100_hw_init(nic)) {
+		DPRINTK(HW, ERR, "e100_hw_init failed\n");
+		return PCIERR_RESULT_DISCONNECT;
+	}
+
+	return PCIERR_RESULT_RECOVERED;
+}
+
+/** e100_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ */
+static void e100_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	/* ack any pending wake events, disable PME */
+	pci_enable_wake(pdev, 0, 0);
+
+	netif_device_attach(netdev);
+	if(netif_running(netdev)) {
+		e100_open (netdev);
+		mod_timer(&nic->watchdog, jiffies);
+	}
+}
+
+static struct pci_error_handlers e100_err_handler = {
+	.error_detected = e100_io_error_detected,
+	.slot_reset = e100_io_slot_reset,
+	.resume = e100_io_resume,
+};
+
+
 static struct pci_driver e100_driver = {
 	.name =         DRV_NAME,
 	.id_table =     e100_id_table,
@@ -2475,6 +2544,7 @@
 	.resume =       e100_resume,
 #endif
 	.shutdown =	e100_shutdown,
+	.err_handler = &e100_err_handler,
 };
 
 static int __init e100_init_module(void)

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 30/42]: ethernet: add PCI error recovery to e1000 dev driver
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (28 preceding siblings ...)
  2005-11-04  0:53 ` [PATCH 29/42]: ethernet: add PCI error recovery to e100 " Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 31/42]: ethernet: add PCI error recovery to ixgb " Linas Vepstas
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel gigabit
ethernet e1000 device driver. The patch has been tested, and appears
to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-git3/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/net/e1000/e1000_main.c	2005-11-02 14:28:50.471317390 -0600
+++ linux-2.6.14-git3/drivers/net/e1000/e1000_main.c	2005-11-02 14:44:00.730691851 -0600
@@ -206,6 +206,16 @@
 void e1000_rx_schedule(void *data);
 #endif
 
+static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state);
+static int e1000_io_slot_reset(struct pci_dev *pdev);
+static void e1000_io_resume(struct pci_dev *pdev);
+
+static struct pci_error_handlers e1000_err_handler = {
+	.error_detected = e1000_io_error_detected,
+	.slot_reset = e1000_io_slot_reset,
+	.resume = e1000_io_resume,
+};
+
 /* Exported from other modules */
 
 extern void e1000_check_options(struct e1000_adapter *adapter);
@@ -218,8 +228,9 @@
 	/* Power Managment Hooks */
 #ifdef CONFIG_PM
 	.suspend  = e1000_suspend,
-	.resume   = e1000_resume
+	.resume   = e1000_resume,
 #endif
+	.err_handler = &e1000_err_handler,
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2937,6 +2948,10 @@
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
+	/* Prevent stats update while adapter is being reset */
+	if (adapter->link_speed == 0)
+		return;
+
 	spin_lock_irqsave(&adapter->stats_lock, flags);
 
 	/* these counters are modified from e1000_adjust_tbi_stats,
@@ -4358,4 +4373,88 @@
 }
 #endif
 
+/* --------------- PCI Error Recovery infrastructure ------------ */
+/** e1000_io_error_detected() is called when PCI error is detected */
+static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	if (netif_running(netdev))
+		e1000_down(adapter);
+
+	/* Request a slot slot reset. */
+	return PCIERR_RESULT_NEED_RESET;
+}
+
+/** e1000_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch.
+ *  Implementation resembles the first-half of the
+ *  e1000_resume routine.
+ */
+static int e1000_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	if (pci_enable_device(pdev)) {
+		printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n");
+		return PCIERR_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	pci_enable_wake(pdev, 3, 0);
+	pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */
+
+	/* Perform card reset only on one instance of the card */
+	if(0 != PCI_FUNC (pdev->devfn))
+		return PCIERR_RESULT_RECOVERED;
+
+	e1000_reset(adapter);
+	E1000_WRITE_REG(&adapter->hw, WUS, ~0);
+
+	return PCIERR_RESULT_RECOVERED;
+}
+
+/** e1000_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ *  Implementation resembles the second-half of the
+ *  e1000_resume routine.
+ */
+static void e1000_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+	uint32_t manc, swsm;
+
+	if(netif_running(netdev)) {
+		if (e1000_up(adapter)) {
+			printk("e1000: can't bring device back up after reset\n");
+			return;
+		}
+	}
+
+	netif_device_attach(netdev);
+
+	if(adapter->hw.mac_type >= e1000_82540 &&
+	    adapter->hw.media_type == e1000_media_type_copper) {
+		manc = E1000_READ_REG(&adapter->hw, MANC);
+		manc &= ~(E1000_MANC_ARP_EN);
+		E1000_WRITE_REG(&adapter->hw, MANC, manc);
+	}
+
+	switch(adapter->hw.mac_type) {
+	case e1000_82573:
+		swsm = E1000_READ_REG(&adapter->hw, SWSM);
+		E1000_WRITE_REG(&adapter->hw, SWSM,
+				swsm | E1000_SWSM_DRV_LOAD);
+		break;
+	default:
+		break;
+	}
+
+	if(netif_running(netdev))
+		mod_timer(&adapter->watchdog_timer, jiffies);
+}
+
 /* e1000_main.c */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 31/42]: ethernet: add PCI error recovery to ixgb dev driver
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (29 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 30/42]: ethernet: add PCI error recovery to e1000 " Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 32/42]: RFC: Add compile-time config options Linas Vepstas
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ten-gigabit
ethernet ixgb device driver. The patch has been tested, and appears
to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/net/ixgb/ixgb_main.c	2005-11-02 14:28:49.225492020 -0600
+++ linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c	2005-11-02 14:44:02.380460486 -0600
@@ -132,6 +132,16 @@
 static void ixgb_netpoll(struct net_device *dev);
 #endif
 
+static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state);
+static int ixgb_io_slot_reset (struct pci_dev *pdev);
+static void ixgb_io_resume (struct pci_dev *pdev);
+
+static struct pci_error_handlers ixgb_err_handler = {
+	.error_detected = ixgb_io_error_detected,
+	.slot_reset = ixgb_io_slot_reset,
+	.resume = ixgb_io_resume,
+};
+
 /* Exported from other modules */
 
 extern void ixgb_check_options(struct ixgb_adapter *adapter);
@@ -141,6 +151,8 @@
 	.id_table = ixgb_pci_tbl,
 	.probe    = ixgb_probe,
 	.remove   = __devexit_p(ixgb_remove),
+	.err_handler = &ixgb_err_handler,
+
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -1654,8 +1666,16 @@
 	unsigned int i;
 #endif
 
+#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY
+	if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) {
+		if (eeh_slot_is_isolated (adapter->pdev))
+		// disable_irq_nosync (adapter->pdev->irq);
+		return IRQ_NONE;      /* Not our interrupt */
+	}
+#else
 	if(unlikely(!icr))
 		return IRQ_NONE;  /* Not our interrupt */
+#endif /* CONFIG_IXGB_EEH_RECOVERY */
 
 	if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) {
 		mod_timer(&adapter->watchdog_timer, jiffies);
@@ -2125,4 +2145,70 @@
 }
 #endif
 
+/* -------------- PCI Error Recovery infrastructure ---------------- */
+/** ixgb_io_error_detected() is called when PCI error is detected */
+static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct ixgb_adapter *adapter = netdev->priv;
+
+	if(netif_running(netdev))
+		ixgb_down(adapter, TRUE);
+
+	/* Request a slot reset. */
+	return PCIERR_RESULT_NEED_RESET;
+}
+
+/** ixgb_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch.
+ *  Implementation resembles the first-half of the
+ *  ixgb_resume routine.
+ */
+static int ixgb_io_slot_reset (struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct ixgb_adapter *adapter = netdev->priv;
+
+	if(pci_enable_device(pdev)) {
+		printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n");
+		return PCIERR_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	/* Perform card reset only on one instance of the card */
+	if (0 != PCI_FUNC (pdev->devfn))
+		return PCIERR_RESULT_RECOVERED;
+
+	ixgb_reset(adapter);
+
+	return PCIERR_RESULT_RECOVERED;
+}
+
+/** ixgb_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ *  Implementation resembles the second-half of the
+ *  ixgb_resume routine.
+ */
+static void ixgb_io_resume (struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct ixgb_adapter *adapter = netdev->priv;
+
+	if(netif_running(netdev)) {
+		if(ixgb_up(adapter)) {
+			printk ("ixgb: can't bring device back up after reset\n");
+			return;
+		}
+	}
+
+	netif_device_attach(netdev);
+	if(netif_running(netdev))
+		mod_timer(&adapter->watchdog_timer, jiffies);
+
+	/* Reading all-ff's from the adapter will completely hose
+	 * the counts and statistics. So just clear them out */
+	memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats));
+	ixgb_update_stats(adapter);
+}
+
 /* ixgb_main.c */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 32/42]: RFC: Add compile-time config options
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (30 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 31/42]: ethernet: add PCI error recovery to ixgb " Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 33/42]: ppc64: remove bogus printk Linas Vepstas
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

32-pci-error-recovery_config-option.patch

This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the 
various device drivers. This patch is "optional" in that its a little bit 
messy, but it does solve a little problem.

-- The good news: this gives some users (e.g. embeddd systems) the option 
	of not compiling in this code, thus making thier device drivers a tiny 
	bit smaller.

-- The bad news: This also clutters up the drivers with extraneous markup 
   and the config process with yet another config.

I don't know if this patch is worth it.  Apply or reject, as desired ... 
Its up to you ... :-)

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-git3/drivers/scsi/ipr.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/scsi/ipr.c	2005-11-02 14:43:52.782806465 -0600
+++ linux-2.6.14-git3/drivers/scsi/ipr.c	2005-11-02 14:44:04.167209911 -0600
@@ -5329,6 +5329,8 @@
 }
 
 /* --------------- PCI Error Recovery infrastructure ----------- */
+#ifdef CONFIG_PCIERR_RECOVERY
+
 /** If the PCI slot is frozen, hold off all i/o
  *  activity; then, as soon as the slot is available again,
  *  initiate an adapter reset.
@@ -5414,6 +5416,7 @@
 	return PCIERR_RESULT_NEED_RESET;
 }
 
+#endif /* CONFIG_PCIERR_RECOVERY */
 /* ------------- end of PCI Error Recovery suport ----------- */
 
 /**
@@ -6153,10 +6156,12 @@
 };
 MODULE_DEVICE_TABLE(pci, ipr_pci_table);
 
+#ifdef CONFIG_PCIERR_RECOVERY
 static struct pci_error_handlers ipr_err_handler = {
 	.error_detected = ipr_eeh_error_detected,
 	.slot_reset = ipr_eeh_slot_reset,
 };
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 static struct pci_driver ipr_driver = {
 	.name = IPR_NAME,
@@ -6164,7 +6169,9 @@
 	.probe = ipr_probe,
 	.remove = ipr_remove,
 	.shutdown = ipr_shutdown,
+#ifdef CONFIG_PCIERR_RECOVERY
 	.err_handler = &ipr_err_handler,
+#endif /* CONFIG_PCIERR_RECOVERY */
 };
 
 /**
Index: linux-2.6.14-git3/drivers/pci/Kconfig
===================================================================
--- linux-2.6.14-git3.orig/drivers/pci/Kconfig	2005-11-02 14:28:48.597580036 -0600
+++ linux-2.6.14-git3/drivers/pci/Kconfig	2005-11-02 14:44:04.172209210 -0600
@@ -13,6 +13,21 @@
 
 	   If you don't know what to do here, say N.
 
+config PCIERR_RECOVERY
+	bool "PCI Error Recovery support"
+	depends on PCI
+	depends on PPC_PSERIES
+	default y
+	help
+	   PCI Error Recovery is a mechanism by which crashed/hung 
+		PCI adapters are automatically detected and rebooted without
+		otherwise disturbing the operation of the system.  Support
+		for this recovery requires special PCI bridge chips (some
+		PCI-E chips may have this support) as well as support in 
+		the device drivers (not all device drivers can handle this).
+
+	   When in doubt, say Y.
+
 config PCI_LEGACY_PROC
 	bool "Legacy /proc/pci interface"
 	depends on PCI
Index: linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-02 14:43:56.084343457 -0600
+++ linux-2.6.14-git3/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-02 14:44:04.195205985 -0600
@@ -763,6 +763,7 @@
  */
 static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); }
 
+#ifdef CONFIG_PCIERR_RECOVERY
 static void sym_eeh_timeout(u_long p)
 {
 	struct sym_eh_wait *ep = (struct sym_eh_wait *) p;
@@ -781,6 +782,7 @@
 
 	complete(&ep->done);
 }
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 /*
  *  Generic method for our eh processing.
@@ -823,6 +825,7 @@
 	/* Try to proceed the operation we have been asked for */
 	sts = -1;
 
+#ifdef CONFIG_PCIERR_RECOVERY
 	/* We may be in an error condition because the PCI bus
 	 * went down. In this case, we need to wait until the
 	 * PCI bus is reset, the card is reset, and only then
@@ -850,6 +853,7 @@
 		}
 		np->s.io_reset_wait = NULL;
 	}
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 	switch(op) {
 	case SYM_EH_ABORT:
@@ -1971,6 +1975,7 @@
 }
 
 /* ------------- PCI Error Recovery infrastructure -------------- */
+#ifdef CONFIG_PCIERR_RECOVERY
 /** sym2_io_error_detected() is called when PCI error is detected */
 static int sym2_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state)
 {
@@ -2021,6 +2026,7 @@
 	np->s.io_state = pci_channel_io_normal;
 	sym_eeh_done (np->s.io_reset_wait);
 }
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 /*
  * Driver host template.
@@ -2275,18 +2281,22 @@
 
 MODULE_DEVICE_TABLE(pci, sym2_id_table);
 
+#ifdef CONFIG_PCIERR_RECOVERY
 static struct pci_error_handlers sym2_err_handler = {
 	.error_detected = sym2_io_error_detected,
 	.slot_reset = sym2_io_slot_reset,
 	.resume = sym2_io_resume,
 };
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 static struct pci_driver sym2_driver = {
 	.name		= NAME53C8XX,
 	.id_table	= sym2_id_table,
 	.probe		= sym2_probe,
 	.remove		= __devexit_p(sym2_remove),
+#ifdef CONFIG_PCIERR_RECOVERY
 	.err_handler = &sym2_err_handler,
+#endif /* CONFIG_PCIERR_RECOVERY */
 };
 
 static int __init sym2_init(void)
Index: linux-2.6.14-git3/drivers/net/e100.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/net/e100.c	2005-11-02 14:43:58.890949857 -0600
+++ linux-2.6.14-git3/drivers/net/e100.c	2005-11-02 14:44:04.222202199 -0600
@@ -2466,6 +2466,7 @@
 
 
 /* ------------------ PCI Error Recovery infrastructure  -------------- */
+#ifdef CONFIG_PCIERR_RECOVERY
 /** e100_io_error_detected() is called when PCI error is detected */
 static int e100_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state)
 {
@@ -2532,6 +2533,7 @@
 	.slot_reset = e100_io_slot_reset,
 	.resume = e100_io_resume,
 };
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 
 static struct pci_driver e100_driver = {
@@ -2544,7 +2546,9 @@
 	.resume =       e100_resume,
 #endif
 	.shutdown =	e100_shutdown,
+#ifdef CONFIG_PCIERR_RECOVERY
 	.err_handler = &e100_err_handler,
+#endif /* CONFIG_PCIERR_RECOVERY */
 };
 
 static int __init e100_init_module(void)
Index: linux-2.6.14-git3/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/net/e1000/e1000_main.c	2005-11-02 14:44:00.730691851 -0600
+++ linux-2.6.14-git3/drivers/net/e1000/e1000_main.c	2005-11-02 14:44:04.266196029 -0600
@@ -206,6 +206,7 @@
 void e1000_rx_schedule(void *data);
 #endif
 
+#ifdef CONFIG_PCIERR_RECOVERY
 static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state);
 static int e1000_io_slot_reset(struct pci_dev *pdev);
 static void e1000_io_resume(struct pci_dev *pdev);
@@ -215,6 +216,7 @@
 	.slot_reset = e1000_io_slot_reset,
 	.resume = e1000_io_resume,
 };
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 /* Exported from other modules */
 
@@ -230,7 +232,9 @@
 	.suspend  = e1000_suspend,
 	.resume   = e1000_resume,
 #endif
+#ifdef CONFIG_PCIERR_RECOVERY
 	.err_handler = &e1000_err_handler,
+#endif /* CONFIG_PCIERR_RECOVERY */
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -4374,6 +4378,7 @@
 #endif
 
 /* --------------- PCI Error Recovery infrastructure ------------ */
+#ifdef CONFIG_PCIERR_RECOVERY
 /** e1000_io_error_detected() is called when PCI error is detected */
 static int e1000_io_error_detected(struct pci_dev *pdev, enum pci_channel_state state)
 {
@@ -4456,5 +4461,6 @@
 	if(netif_running(netdev))
 		mod_timer(&adapter->watchdog_timer, jiffies);
 }
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 /* e1000_main.c */
Index: linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.14-git3.orig/drivers/net/ixgb/ixgb_main.c	2005-11-02 14:44:02.380460486 -0600
+++ linux-2.6.14-git3/drivers/net/ixgb/ixgb_main.c	2005-11-02 14:44:04.289192804 -0600
@@ -132,6 +132,7 @@
 static void ixgb_netpoll(struct net_device *dev);
 #endif
 
+#ifdef CONFIG_PCIERR_RECOVERY
 static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state);
 static int ixgb_io_slot_reset (struct pci_dev *pdev);
 static void ixgb_io_resume (struct pci_dev *pdev);
@@ -141,6 +142,7 @@
 	.slot_reset = ixgb_io_slot_reset,
 	.resume = ixgb_io_resume,
 };
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 /* Exported from other modules */
 
@@ -151,8 +153,9 @@
 	.id_table = ixgb_pci_tbl,
 	.probe    = ixgb_probe,
 	.remove   = __devexit_p(ixgb_remove),
+#ifdef CONFIG_PCIERR_RECOVERY
 	.err_handler = &ixgb_err_handler,
-
+#endif /* CONFIG_PCIERR_RECOVERY */
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2146,6 +2149,7 @@
 #endif
 
 /* -------------- PCI Error Recovery infrastructure ---------------- */
+#ifdef CONFIG_PCIERR_RECOVERY
 /** ixgb_io_error_detected() is called when PCI error is detected */
 static int ixgb_io_error_detected (struct pci_dev *pdev, enum pci_channel_state state)
 {
@@ -2210,5 +2214,6 @@
 	memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats));
 	ixgb_update_stats(adapter);
 }
+#endif /* CONFIG_PCIERR_RECOVERY */
 
 /* ixgb_main.c */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 33/42]: ppc64: remove bogus printk
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (31 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 32/42]: RFC: Add compile-time config options Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 34/42]: ppc64: Remove duplicate code Linas Vepstas
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

233-eeh-buid-fix.patch

Remove un-desired warning print from EEH code. 

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>


Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:43:49.212307192 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:45:00.429319560 -0600
@@ -824,12 +824,10 @@
 	if (!dn || !PCI_DN(dn))
 		return;
 	phb = PCI_DN(dn)->phb;
-	if (NULL == phb || 0 == phb->buid) {
-		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
-		       dn->full_name);
-		dump_stack();
+
+	/* USB Bus children of PCI devices will not have BUID's */
+	if (NULL == phb || 0 == phb->buid)
 		return;
-	}
 
 	info.buid_hi = BUID_HI(phb->buid);
 	info.buid_lo = BUID_LO(phb->buid);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 34/42]: ppc64: Remove duplicate code
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (32 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 33/42]: ppc64: remove bogus printk Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 35/42]: ppc64: bugfix: fill in un-initialzed field Linas Vepstas
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

234-eeh-find-pe.patch

The find_device_pe() routine is duplicated in two files. Remove one of the two
copies, declare the other extern.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:41:18.435451353 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:45:43.638259683 -0600
@@ -42,19 +42,6 @@
 	return "";
 }
 
-/**
- * Return the "partitionable endpoint" (pe) under which this device lies
- */
-static struct device_node * find_device_pe(struct device_node *dn)
-{
-	while ((dn->parent) && PCI_DN(dn->parent) &&
-	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
-		dn = dn->parent;
-	}
-	return dn;
-}
-
-
 #ifdef DEBUG
 static void print_device_node_tree (struct pci_dn *pdn, int dent)
 {
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:45:00.429319560 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:45:43.651257860 -0600
@@ -172,7 +172,7 @@
 /** 
  * Return the "partitionable endpoint" (pe) under which this device lies
  */
-static struct device_node * find_device_pe(struct device_node *dn)
+struct device_node * find_device_pe(struct device_node *dn)
 {
 	while ((dn->parent) && PCI_DN(dn->parent) &&
 	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:42:38.998153856 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:45:43.656257159 -0600
@@ -110,6 +110,9 @@
 void eeh_mark_slot (struct device_node *dn, int mode_flag);
 void eeh_clear_slot (struct device_node *dn, int mode_flag);
 
+/* Find the associated "Partiationable Endpoint" PE */
+struct device_node * find_device_pe(struct device_node *dn);
+
 #endif
 
 #endif /* _ASM_POWERPC_PPC_PCI_H */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 35/42]: ppc64: bugfix: fill in un-initialzed field
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (33 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 34/42]: ppc64: Remove duplicate code Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 36/42]: ppc64: Use PE configuration address consistently Linas Vepstas
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

235-eeh-set-pcidev-bugfix.patch

The pci device field should be initialized to a valid value.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-02 14:42:38.994154417 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-02 14:46:23.687642815 -0600
@@ -307,6 +307,9 @@
 		/* Save the BAR's; firmware doesn't restore these after EEH reset */
 		dn = pci_device_to_OF_node(dev);
 		eeh_save_bars(dev, PCI_DN(dn));
+
+		pci_dev_get (dev);  /* matching put is in eeh_remove_device() */
+		PCI_DN(dn)->pcidev = dev;
 	}
 
 #ifdef DEBUG

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 36/42]: ppc64: Use PE configuration address consistently
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (34 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 35/42]: ppc64: bugfix: fill in un-initialzed field Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 37/42]: ppc64: set up the RTAS token just like the rest of them Linas Vepstas
                   ` (7 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

236-eeh-config-addr.patch

The PE configuration address wasn't being cnsistently used in all locations
where a config address is called for.  This patch adds it to the places it
should have appeared in.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:45:43.651257860 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:47:07.798456202 -0600
@@ -110,6 +110,7 @@
 
 void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
 {
+	int config_addr;
 	unsigned long flags;
 	int rc;
 
@@ -117,8 +118,13 @@
 	spin_lock_irqsave(&slot_errbuf_lock, flags);
 	memset(slot_errbuf, 0, eeh_error_buf_size);
 
+	/* Use PE configuration address, if present */
+	config_addr = pdn->eeh_config_addr;
+	if (pdn->eeh_pe_config_addr)
+		config_addr = pdn->eeh_pe_config_addr;
+
 	rc = rtas_call(ibm_slot_error_detail,
-	               8, 1, NULL, pdn->eeh_config_addr,
+	               8, 1, NULL, config_addr,
 	               BUID_HI(pdn->phb->buid),
 	               BUID_LO(pdn->phb->buid), NULL, 0,
 	               virt_to_phys(slot_errbuf),
@@ -138,6 +144,7 @@
 static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
 {
 	int token, outputs;
+	int config_addr;
 
 	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
 		token = ibm_read_slot_reset_state2;
@@ -148,7 +155,12 @@
 		outputs = 3;
 	}
 
-	return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr,
+	/* Use PE configuration address, if present */
+	config_addr = pdn->eeh_config_addr;
+	if (pdn->eeh_pe_config_addr)
+		config_addr = pdn->eeh_pe_config_addr;
+
+	return rtas_call(token, 3, outputs, rets, config_addr,
 			 BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid));
 }
 
@@ -284,7 +296,7 @@
 		return 0;
 	}
 
-	if (!pdn->eeh_config_addr) {
+	if (!pdn->eeh_config_addr && !pdn->eeh_pe_config_addr) {
 		__get_cpu_var(no_cfg_addr)++;
 		return 0;
 	}
@@ -613,13 +625,20 @@
 void
 rtas_configure_bridge(struct pci_dn *pdn)
 {
+	int config_addr;
 	int token = rtas_token ("ibm,configure-bridge");
 	int rc;
 
 	if (token == RTAS_UNKNOWN_SERVICE)
 		return;
+	
+	/* Use PE configuration address, if present */
+	config_addr = pdn->eeh_config_addr;
+	if (pdn->eeh_pe_config_addr)
+		config_addr = pdn->eeh_pe_config_addr;
+
 	rc = rtas_call(token,3,1, NULL,
-	               pdn->eeh_config_addr,
+	               config_addr,
 	               BUID_HI(pdn->phb->buid),
 	               BUID_LO(pdn->phb->buid));
 	if (rc) {

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 37/42]: ppc64: set up the RTAS token just like the rest of them.
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (35 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 36/42]: ppc64: Use PE configuration address consistently Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:54 ` [PATCH 38/42]: ppc64: Don't continue with PCI Error recovery if slot reset failed Linas Vepstas
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

237-eeh-bridge-token.patch

Minor: the rtas-bridge toekn should be set up the same way that all the 
other tokens rtas tokens are set up.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:47:07.798456202 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:47:38.997080468 -0600
@@ -84,6 +84,7 @@
 static int ibm_read_slot_reset_state2;
 static int ibm_slot_error_detail;
 static int ibm_get_config_addr_info;
+static int ibm_configure_bridge;
 
 static int eeh_subsystem_enabled;
 
@@ -626,18 +627,14 @@
 rtas_configure_bridge(struct pci_dn *pdn)
 {
 	int config_addr;
-	int token = rtas_token ("ibm,configure-bridge");
 	int rc;
 
-	if (token == RTAS_UNKNOWN_SERVICE)
-		return;
-	
 	/* Use PE configuration address, if present */
 	config_addr = pdn->eeh_config_addr;
 	if (pdn->eeh_pe_config_addr)
 		config_addr = pdn->eeh_pe_config_addr;
 
-	rc = rtas_call(token,3,1, NULL,
+	rc = rtas_call(ibm_configure_bridge,3,1, NULL,
 	               config_addr,
 	               BUID_HI(pdn->phb->buid),
 	               BUID_LO(pdn->phb->buid));
@@ -789,6 +786,7 @@
 	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
 	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
 	ibm_get_config_addr_info = rtas_token("ibm,get-config-addr-info");
+	ibm_configure_bridge = rtas_token ("ibm,configure-bridge");
 
 	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE)
 		return;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 38/42]: ppc64: Don't continue with PCI Error recovery if slot reset failed.
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (36 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 37/42]: ppc64: set up the RTAS token just like the rest of them Linas Vepstas
@ 2005-11-04  0:54 ` Linas Vepstas
  2005-11-04  0:55 ` [PATCH 39/42]: ppc64: handle multifunction PCI devices properly Linas Vepstas
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:54 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

238-eeh-stop-if-reset_failed.patch

If the firmware is unable to reset the PCI slot for some reason, then 
don't attempt any further recovery steps after that point.  Instead,
mark the device as permanently failed.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:47:38.997080468 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:48:13.093298267 -0600
@@ -450,11 +450,16 @@
 	if (rc) return rc;
 
 	if (rets[1] == 0) return -1;  /* EEH is not supported */
-	if (rets[0] == 0)  return 0;  /* Oll Korrect */
+	if (rets[0] == 0) return 0;   /* Oll Korrect */
 	if (rets[0] == 5) {
 		if (rets[2] == 0) return -1; /* permanently unavailable */
 		return rets[2]; /* number of millisecs to wait */
 	}
+	if (rets[0] == 1)
+		return 250;
+
+	printk (KERN_ERR "EEH: Slot unavailable: rc=%d, rets=%d %d %d\n",
+		rc, rets[0], rets[1], rets[2]);
 	return -1;
 }
 
@@ -501,9 +506,11 @@
 
 /** rtas_set_slot_reset -- assert the pci #RST line for 1/4 second
  *  dn -- device node to be reset.
+ *
+ *  Return 0 if success, else a non-zero value.
  */
 
-void
+int
 rtas_set_slot_reset(struct pci_dn *pdn)
 {
 	int i, rc;
@@ -533,10 +540,21 @@
 	 * ready to be used; if not, wait for recovery. */
 	for (i=0; i<10; i++) {
 		rc = eeh_slot_availability (pdn);
-		if (rc <= 0) break;
+		if (rc < 0)
+			printk (KERN_ERR "EEH: failed (%d) to reset slot %s\n", rc, pdn->node->full_name);
+		if (rc == 0)
+			return 0;
+		if (rc < 0)
+			return -1;
 
 		msleep (rc+100);
 	}
+
+	rc = eeh_slot_availability (pdn);
+	if (rc)
+		printk (KERN_ERR "EEH: timeout resetting slot %s\n", pdn->node->full_name);
+
+	return rc;
 }
 
 /* ------------------------------------------------------- */
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:45:43.638259683 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:48:13.100297285 -0600
@@ -200,14 +200,18 @@
  *            bus resets can be performed.
  */
 
-static void eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus)
+static int eeh_reset_device (struct pci_dn *pe_dn, struct pci_bus *bus)
 {
+	int rc;
 	if (bus)
 		pcibios_remove_pci_devices(bus);
 
 	/* Reset the pci controller. (Asserts RST#; resets config space).
-	 * Reconfigure bridges and devices */
-	rtas_set_slot_reset(pe_dn);
+	 * Reconfigure bridges and devices. Don't try to bring the system
+	 * up if the reset failed for some reason. */
+	rc = rtas_set_slot_reset(pe_dn);
+	if (rc)
+		return rc;
 
 	/* Walk over all functions on this device */
 	rtas_configure_bridge(pe_dn);
@@ -223,6 +227,8 @@
 		ssleep (5);
 		pcibios_add_pci_devices(bus);
 	}
+
+	return 0;
 }
 
 /* The longest amount of time to wait for a pci device
@@ -235,7 +241,7 @@
 	struct device_node *frozen_dn;
 	struct pci_dn *frozen_pdn;
 	struct pci_bus *frozen_bus;
-	int perm_failure = 0;
+	int rc = 0;
 
 	frozen_dn = find_device_pe(event->dn);
 	frozen_bus = pcibios_find_pci_bus(frozen_dn);
@@ -272,7 +278,7 @@
 	frozen_pdn->eeh_freeze_count++;
 	
 	if (frozen_pdn->eeh_freeze_count > EEH_MAX_ALLOWED_FREEZES)
-		perm_failure = 1;
+		goto hard_fail;
 
 	/* If the reset state is a '5' and the time to reset is 0 (infinity)
 	 * or is more then 15 seconds, then mark this as a permanent failure.
@@ -280,34 +286,7 @@
 	if ((event->state == pci_channel_io_perm_failure) &&
 	    ((event->time_unavail <= 0) ||
 	     (event->time_unavail > MAX_WAIT_FOR_RECOVERY*1000)))
-	{
-		perm_failure = 1;
-	}
-
-	/* Log the error with the rtas logger. */
-	if (perm_failure) {
-		/*
-		 * About 90% of all real-life EEH failures in the field
-		 * are due to poorly seated PCI cards. Only 10% or so are
-		 * due to actual, failed cards.
-		 */
-		printk(KERN_ERR
-		   "EEH: PCI device %s - %s has failed %d times \n"
-		   "and has been permanently disabled.  Please try reseating\n"
-		   "this device or replacing it.\n",
-			pci_name (frozen_pdn->pcidev), 
-			pcid_name(frozen_pdn->pcidev), 
-			frozen_pdn->eeh_freeze_count);
-
-		eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */);
-
-		/* Notify all devices that they're about to go down. */
-		pci_walk_bus(frozen_bus, eeh_report_failure, 0);
-
-		/* Shut down the device drivers for good. */
-		pcibios_remove_pci_devices(frozen_bus);
-		return;
-	}
+		goto hard_fail;
 
 	eeh_slot_error_detail(frozen_pdn, 1 /* Temporary Error */);
 	printk(KERN_WARNING
@@ -330,24 +309,54 @@
 	 * go down willingly, without panicing the system.
 	 */
 	if (result == PCIERR_RESULT_NONE) {
-		eeh_reset_device(frozen_pdn, frozen_bus);
+		rc = eeh_reset_device(frozen_pdn, frozen_bus);
+		if (rc)
+			goto hard_fail;
 	}
 
 	/* If any device called out for a reset, then reset the slot */
 	if (result == PCIERR_RESULT_NEED_RESET) {
-		eeh_reset_device(frozen_pdn, NULL);
+		rc = eeh_reset_device(frozen_pdn, NULL);
+		if (rc)
+			goto hard_fail;
 		pci_walk_bus(frozen_bus, eeh_report_reset, 0);
 	}
 
 	/* If all devices reported they can proceed, the re-enable PIO */
 	if (result == PCIERR_RESULT_CAN_RECOVER) {
 		/* XXX Not supported; we brute-force reset the device */
-		eeh_reset_device(frozen_pdn, NULL);
+		rc = eeh_reset_device(frozen_pdn, NULL);
+		if (rc)
+			goto hard_fail;
 		pci_walk_bus(frozen_bus, eeh_report_reset, 0);
 	}
 
 	/* Tell all device drivers that they can resume operations */
 	pci_walk_bus(frozen_bus, eeh_report_resume, 0);
+
+	return;
+	
+hard_fail:
+	/*
+	 * About 90% of all real-life EEH failures in the field
+	 * are due to poorly seated PCI cards. Only 10% or so are
+	 * due to actual, failed cards.
+	 */
+	printk(KERN_ERR
+	   "EEH: PCI device %s - %s has failed %d times \n"
+	   "and has been permanently disabled.  Please try reseating\n"
+	   "this device or replacing it.\n",
+		pci_name (frozen_pdn->pcidev), 
+		pcid_name(frozen_pdn->pcidev), 
+		frozen_pdn->eeh_freeze_count);
+
+	eeh_slot_error_detail(frozen_pdn, 2 /* Permanent Error */);
+
+	/* Notify all devices that they're about to go down. */
+	pci_walk_bus(frozen_bus, eeh_report_failure, 0);
+
+	/* Shut down the device drivers for good. */
+	pcibios_remove_pci_devices(frozen_bus);
 }
 
 /* ---------- end of file ---------- */
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 14:45:43.656257159 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 14:48:13.104296724 -0600
@@ -77,8 +77,10 @@
  * does this by asserting the PCI #RST line for 1/8th of
  * a second; this routine will sleep while the adapter is
  * being reset.
+ *
+ * Returns a non-zero value if the reset failed.
  */
-void rtas_set_slot_reset (struct pci_dn *);
+int rtas_set_slot_reset (struct pci_dn *);
 
 /** 
  * eeh_restore_bars - Restore device configuration info.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 39/42]: ppc64: handle multifunction PCI devices  properly
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (37 preceding siblings ...)
  2005-11-04  0:54 ` [PATCH 38/42]: ppc64: Don't continue with PCI Error recovery if slot reset failed Linas Vepstas
@ 2005-11-04  0:55 ` Linas Vepstas
  2005-11-04  0:55 ` [PATCH 40/42]: ppc64: IOMMU: don't ioremap null pointers Linas Vepstas
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:55 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

239-eeh-multifunction-consolidate.patch

New-style firmware will often place multiple different functions 
under a non-EEH-aware parent.  However, tehse devices might share 
a common PE "partition endpoint" and config address, ad thus any
EEH events will affect all of the devices in common.  This patch
makes the effort to find all of these common devices and handle
them together.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:48:13.093298267 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:48:44.941831253 -0600
@@ -223,6 +223,11 @@
 void eeh_mark_slot (struct device_node *dn, int mode_flag)
 {
 	dn = find_device_pe (dn);
+
+	/* Back up one, since config addrs might be shared */
+	if (PCI_DN(dn) && PCI_DN(dn)->eeh_pe_config_addr)
+		dn = dn->parent;
+
 	PCI_DN(dn)->eeh_mode |= mode_flag;
 	__eeh_mark_slot (dn->child, mode_flag);
 }
@@ -244,7 +249,13 @@
 {
 	unsigned long flags;
 	spin_lock_irqsave(&confirm_error_lock, flags);
+	
 	dn = find_device_pe (dn);
+	
+	/* Back up one, since config addrs might be shared */
+	if (PCI_DN(dn) && PCI_DN(dn)->eeh_pe_config_addr)
+		dn = dn->parent;
+
 	PCI_DN(dn)->eeh_mode &= ~mode_flag;
 	PCI_DN(dn)->eeh_check_count = 0;
 	__eeh_clear_slot (dn->child, mode_flag);
@@ -609,7 +620,7 @@
 	if (!pdn) 
 		return;
 	
-	if (! pdn->eeh_is_bridge)
+	if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge))
 		__restore_bars (pdn);
 
 	dn = pdn->node->child;
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:48:13.100297285 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_driver.c	2005-11-02 14:48:44.950829991 -0600
@@ -213,9 +213,23 @@
 	if (rc)
 		return rc;
 
-	/* Walk over all functions on this device */
-	rtas_configure_bridge(pe_dn);
-	eeh_restore_bars(pe_dn);
+ 	/* New-style config addrs might be shared across multiple devices,
+ 	 * Walk over all functions on this device */
+ 	if (pe_dn->eeh_pe_config_addr) {
+ 		struct device_node *pe = pe_dn->node;
+ 		pe = pe->parent->child;
+ 		while (pe) {
+ 			struct pci_dn *ppe = PCI_DN(pe);
+ 			if (pe_dn->eeh_pe_config_addr == ppe->eeh_pe_config_addr) {
+ 				rtas_configure_bridge(ppe);
+ 				eeh_restore_bars(ppe);
+ 			}
+ 			pe = pe->sibling;
+ 		}
+ 	} else {
+ 		rtas_configure_bridge(pe_dn);
+ 		eeh_restore_bars(pe_dn);
+ 	}
 
 	/* Give the system 5 seconds to finish running the user-space
 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes, 

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 40/42]: ppc64: IOMMU: don't ioremap null pointers
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (38 preceding siblings ...)
  2005-11-04  0:55 ` [PATCH 39/42]: ppc64: handle multifunction PCI devices properly Linas Vepstas
@ 2005-11-04  0:55 ` Linas Vepstas
  2005-11-04  0:55 ` [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence Linas Vepstas
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:55 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

240-ioremap-null-ptr-test.patch

Under highly unusual circumstances, a buggy driver will ask a null ptr to be
ioremapped, an operation that curently suceeds but leads to later trouble.
Instead, refuse to remap the null pointer.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-git3/arch/powerpc/mm/pgtable_64.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/mm/pgtable_64.c	2005-11-02 14:59:56.507624778 -0600
+++ linux-2.6.14-git3/arch/powerpc/mm/pgtable_64.c	2005-11-02 15:01:04.284115774 -0600
@@ -185,7 +185,7 @@
 	pa = addr & PAGE_MASK;
 	size = PAGE_ALIGN(addr + size) - pa;
 
-	if (size == 0)
+	if ((size == 0) || (pa == 0))
 		return NULL;
 
 	if (mem_init_done) {

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (39 preceding siblings ...)
  2005-11-04  0:55 ` [PATCH 40/42]: ppc64: IOMMU: don't ioremap null pointers Linas Vepstas
@ 2005-11-04  0:55 ` Linas Vepstas
  2005-11-04 22:14   ` linas
  2005-11-04  0:55 ` [PATCH 42/42]: ppc64: get rid of per_cpu counters Linas Vepstas
                   ` (2 subsequent siblings)
  43 siblings, 1 reply; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:55 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

241-eeh-save-bars-earlier.patch

Save the PCI device bars *before* any PCI probing is done. 

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/rtas_pci.c	2005-10-31 12:01:21.000000000 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/rtas_pci.c	2005-11-02 16:52:48.556202006 -0600
@@ -72,7 +72,7 @@
         return 0;
 }
 
-static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
+int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
 {
 	int returnval = -1;
 	unsigned long buid, addr;
Index: linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-powerpc/ppc-pci.h	2005-11-02 16:53:29.000000000 -0600
+++ linux-2.6.14-git3/include/asm-powerpc/ppc-pci.h	2005-11-02 17:28:14.843073955 -0600
@@ -59,8 +59,6 @@
 void pci_addr_cache_build(void);
 struct pci_dev *pci_get_device_by_addr(unsigned long addr);
 
-void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn);
-
 /**
  * eeh_slot_error_detail -- record and EEH error condition to the log
  * @severity: 1 if temporary, 2 if permanent failure.
@@ -104,6 +102,7 @@
 void rtas_configure_bridge(struct pci_dn *);
 
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
+int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
 
 /**
  * mark and clear slots: find "partition endpoint" PE and set or 
Index: linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h
===================================================================
--- linux-2.6.14-git3.orig/include/asm-ppc64/pci-bridge.h	2005-11-02 14:43:49.000000000 -0600
+++ linux-2.6.14-git3/include/asm-ppc64/pci-bridge.h	2005-11-02 17:13:07.358586231 -0600
@@ -58,15 +58,15 @@
 struct iommu_table;
 
 struct pci_dn {
-	int	busno;			/* for pci devices */
-	int	bussubno;		/* for pci devices */
-	int	devfn;			/* for pci devices */
+	int	busno;			/* pci bus number */
+	int	bussubno;		/* pci subordinate bus number */
+	int	devfn;			/* pci device and function number */
+	int	class_code;		/* pci device class */
 	int	eeh_mode;		/* See eeh.h for possible EEH_MODEs */
 	int	eeh_config_addr;
 	int	eeh_pe_config_addr; /* new-style partition endpoint address */
 	int 	eeh_check_count;	/* # times driver ignored error */
 	int 	eeh_freeze_count;	/* # times this device froze up. */
-	int	eeh_is_bridge;		/* device is pci-to-pci bridge */
 
 	int	pci_ext_config_space;	/* for pci devices */
 	struct  pci_controller *phb;	/* for pci devices */
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 16:45:55.000000000 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 18:42:28.243139205 -0600
@@ -106,6 +106,8 @@
 static DEFINE_PER_CPU(unsigned long, ignored_failures);
 static DEFINE_PER_CPU(unsigned long, slot_resets);
 
+#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
+
 /* --------------------------------------------------------------- */
 /* Below lies the EEH event infrastructure */
 
@@ -620,7 +622,7 @@
 	if (!pdn) 
 		return;
 	
-	if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge))
+	if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && !IS_BRIDGE(pdn->class_code))
 		__restore_bars (pdn);
 
 	dn = pdn->node->child;
@@ -638,18 +640,15 @@
  * PCI devices are added individuallly; but, for the restore,
  * an entire slot is reset at a time.
  */
-void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn)
+static void eeh_save_bars(struct pci_dn *pdn)
 {
 	int i;
 
-	if (!pdev || !pdn )
+	if (!pdn )
 		return;
 	
 	for (i = 0; i < 16; i++)
-		pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]);
-
-	if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
-		pdn->eeh_is_bridge = 1;
+		rtas_read_config(pdn, i * 4, 4, &pdn->config_space[i]);
 }
 
 void
@@ -699,6 +698,7 @@
 	int enable;
 	struct pci_dn *pdn = PCI_DN(dn);
 
+	pdn->class_code = *class_code;
 	pdn->eeh_mode = 0;
 	pdn->eeh_check_count = 0;
 	pdn->eeh_freeze_count = 0;
@@ -781,6 +781,7 @@
 		       dn->full_name);
 	}
 
+	eeh_save_bars(pdn);
 	return NULL;
 }
 
@@ -915,7 +916,6 @@
 	pdn->pcidev = dev;
 
 	pci_addr_cache_insert_device (dev);
-	eeh_save_bars(dev, pdn);
 }
 EXPORT_SYMBOL_GPL(eeh_add_device_late);
 
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-02 16:45:55.000000000 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-02 18:40:54.893242771 -0600
@@ -304,10 +304,7 @@
 
 		pci_addr_cache_insert_device(dev);
 
-		/* Save the BAR's; firmware doesn't restore these after EEH reset */
 		dn = pci_device_to_OF_node(dev);
-		eeh_save_bars(dev, PCI_DN(dn));
-
 		pci_dev_get (dev);  /* matching put is in eeh_remove_device() */
 		PCI_DN(dn)->pcidev = dev;
 	}

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 42/42]: ppc64: get rid of per_cpu counters
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (40 preceding siblings ...)
  2005-11-04  0:55 ` [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence Linas Vepstas
@ 2005-11-04  0:55 ` Linas Vepstas
  2005-11-04  0:57 ` [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 Linas Vepstas
  2005-11-04 22:14 ` [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Greg KH
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:55 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

242-eeh-no-percpu-counters.patch

Remove per-cpu counters from the EEH code.  These statistics counters are 
incremented at a very low-frequency, and the performance gains of per-cpu 
variables are negligable.  By conrast, the counters weren't safe against
cpu gard operations, and its not worth the effeort to make them so (other
than to turn them into plain globals).

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 18:42:28.243139205 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 18:49:24.196716323 -0600
@@ -97,14 +97,14 @@
 static int eeh_error_buf_size;
 
 /* System monitoring statistics */
-static DEFINE_PER_CPU(unsigned long, no_device);
-static DEFINE_PER_CPU(unsigned long, no_dn);
-static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
-static DEFINE_PER_CPU(unsigned long, ignored_check);
-static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
-static DEFINE_PER_CPU(unsigned long, false_positives);
-static DEFINE_PER_CPU(unsigned long, ignored_failures);
-static DEFINE_PER_CPU(unsigned long, slot_resets);
+static unsigned long no_device;
+static unsigned long no_dn;
+static unsigned long no_cfg_addr;
+static unsigned long ignored_check;
+static unsigned long total_mmio_ffs;
+static unsigned long false_positives;
+static unsigned long ignored_failures;
+static unsigned long slot_resets;
 
 #define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
 
@@ -288,13 +288,13 @@
 	enum pci_channel_state state;
 	int rc = 0;
 
-	__get_cpu_var(total_mmio_ffs)++;
+	total_mmio_ffs++;
 
 	if (!eeh_subsystem_enabled)
 		return 0;
 
 	if (!dn) {
-		__get_cpu_var(no_dn)++;
+		no_dn++;
 		return 0;
 	}
 	pdn = PCI_DN(dn);
@@ -302,7 +302,7 @@
 	/* Access to IO BARs might get this far and still not want checking. */
 	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
 	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
-		__get_cpu_var(ignored_check)++;
+		ignored_check++;
 #ifdef DEBUG
 		printk ("EEH:ignored check (%x) for %s %s\n", 
 		        pdn->eeh_mode, pci_name (dev), dn->full_name);
@@ -311,7 +311,7 @@
 	}
 
 	if (!pdn->eeh_config_addr && !pdn->eeh_pe_config_addr) {
-		__get_cpu_var(no_cfg_addr)++;
+		no_cfg_addr++;
 		return 0;
 	}
 
@@ -353,7 +353,7 @@
 	if (ret != 0) {
 		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
 		       ret, dn->full_name);
-		__get_cpu_var(false_positives)++;
+		false_positives++;
 		rc = 0;
 		goto dn_unlock;
 	}
@@ -362,14 +362,14 @@
 	if (rets[1] != 1) {
 		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
 		       ret, dn->full_name);
-		__get_cpu_var(false_positives)++;
+		false_positives++;
 		rc = 0;
 		goto dn_unlock;
 	}
 
 	/* If not the kind of error we know about, punt. */
 	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
-		__get_cpu_var(false_positives)++;
+		false_positives++;
 		rc = 0;
 		goto dn_unlock;
 	}
@@ -377,12 +377,12 @@
 	/* Note that config-io to empty slots may fail;
 	 * we recognize empty because they don't have children. */
 	if ((rets[0] == 5) && (dn->child == NULL)) {
-		__get_cpu_var(false_positives)++;
+		false_positives++;
 		rc = 0;
 		goto dn_unlock;
 	}
 
-	__get_cpu_var(slot_resets)++;
+	slot_resets++;
  
 	/* Avoid repeated reports of this failure, including problems
 	 * with other functions on this device, and functions under
@@ -432,7 +432,7 @@
 	addr = eeh_token_to_phys((unsigned long __force) token);
 	dev = pci_get_device_by_addr(addr);
 	if (!dev) {
-		__get_cpu_var(no_device)++;
+		no_device++;
 		return val;
 	}
 
@@ -963,25 +963,9 @@
 
 static int proc_eeh_show(struct seq_file *m, void *v)
 {
-	unsigned int cpu;
-	unsigned long ffs = 0, positives = 0, failures = 0;
-	unsigned long resets = 0;
-	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
-
-	for_each_cpu(cpu) {
-		ffs += per_cpu(total_mmio_ffs, cpu);
-		positives += per_cpu(false_positives, cpu);
-		failures += per_cpu(ignored_failures, cpu);
-		resets += per_cpu(slot_resets, cpu);
-		no_dev += per_cpu(no_device, cpu);
-		no_dn += per_cpu(no_dn, cpu);
-		no_cfg += per_cpu(no_cfg_addr, cpu);
-		no_check += per_cpu(ignored_check, cpu);
-	}
-
 	if (0 == eeh_subsystem_enabled) {
 		seq_printf(m, "EEH Subsystem is globally disabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
+		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", total_mmio_ffs);
 	} else {
 		seq_printf(m, "EEH Subsystem is enabled\n");
 		seq_printf(m,
@@ -993,8 +977,10 @@
 				"eeh_false_positives=%ld\n"
 				"eeh_ignored_failures=%ld\n"
 				"eeh_slot_resets=%ld\n",
-				no_dev, no_dn, no_cfg, no_check,
-				ffs, positives, failures, resets);
+				no_device, no_dn, no_cfg_addr, 
+				ignored_check, total_mmio_ffs, 
+				false_positives, ignored_failures, 
+				slot_resets);
 	}
 
 	return 0;

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (41 preceding siblings ...)
  2005-11-04  0:55 ` [PATCH 42/42]: ppc64: get rid of per_cpu counters Linas Vepstas
@ 2005-11-04  0:57 ` Linas Vepstas
  2005-11-04 22:14 ` [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Greg KH
  43 siblings, 0 replies; 131+ messages in thread
From: Linas Vepstas @ 2005-11-04  0:57 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

11-eeh-move-to-powerpc.patch

Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c
No other changes (except for Makefile to build it)

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

Index: linux-2.6.14-git3/arch/ppc64/kernel/eeh.c
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/eeh.c	2005-11-02 14:29:22.485829789 -0600
+++ /dev/null	1970-01-01 00:00:00.000000000 +0000
@@ -1,1093 +0,0 @@
-/*
- * eeh.c
- * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
- */
-
-#include <linux/init.h>
-#include <linux/list.h>
-#include <linux/notifier.h>
-#include <linux/pci.h>
-#include <linux/proc_fs.h>
-#include <linux/rbtree.h>
-#include <linux/seq_file.h>
-#include <linux/spinlock.h>
-#include <asm/atomic.h>
-#include <asm/eeh.h>
-#include <asm/io.h>
-#include <asm/machdep.h>
-#include <asm/rtas.h>
-#include <asm/atomic.h>
-#include <asm/systemcfg.h>
-#include <asm/ppc-pci.h>
-
-#undef DEBUG
-
-/** Overview:
- *  EEH, or "Extended Error Handling" is a PCI bridge technology for
- *  dealing with PCI bus errors that can't be dealt with within the
- *  usual PCI framework, except by check-stopping the CPU.  Systems
- *  that are designed for high-availability/reliability cannot afford
- *  to crash due to a "mere" PCI error, thus the need for EEH.
- *  An EEH-capable bridge operates by converting a detected error
- *  into a "slot freeze", taking the PCI adapter off-line, making
- *  the slot behave, from the OS'es point of view, as if the slot
- *  were "empty": all reads return 0xff's and all writes are silently
- *  ignored.  EEH slot isolation events can be triggered by parity
- *  errors on the address or data busses (e.g. during posted writes),
- *  which in turn might be caused by low voltage on the bus, dust,
- *  vibration, humidity, radioactivity or plain-old failed hardware.
- *
- *  Note, however, that one of the leading causes of EEH slot
- *  freeze events are buggy device drivers, buggy device microcode,
- *  or buggy device hardware.  This is because any attempt by the
- *  device to bus-master data to a memory address that is not
- *  assigned to the device will trigger a slot freeze.   (The idea
- *  is to prevent devices-gone-wild from corrupting system memory).
- *  Buggy hardware/drivers will have a miserable time co-existing
- *  with EEH.
- *
- *  Ideally, a PCI device driver, when suspecting that an isolation
- *  event has occured (e.g. by reading 0xff's), will then ask EEH
- *  whether this is the case, and then take appropriate steps to
- *  reset the PCI slot, the PCI device, and then resume operations.
- *  However, until that day,  the checking is done here, with the
- *  eeh_check_failure() routine embedded in the MMIO macros.  If
- *  the slot is found to be isolated, an "EEH Event" is synthesized
- *  and sent out for processing.
- */
-
-/* EEH event workqueue setup. */
-static DEFINE_SPINLOCK(eeh_eventlist_lock);
-LIST_HEAD(eeh_eventlist);
-static void eeh_event_handler(void *);
-DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);
-
-static struct notifier_block *eeh_notifier_chain;
-
-/* If a device driver keeps reading an MMIO register in an interrupt
- * handler after a slot isolation event has occurred, we assume it
- * is broken and panic.  This sets the threshold for how many read
- * attempts we allow before panicking.
- */
-#define EEH_MAX_FAILS	100000
-
-/* RTAS tokens */
-static int ibm_set_eeh_option;
-static int ibm_set_slot_reset;
-static int ibm_read_slot_reset_state;
-static int ibm_read_slot_reset_state2;
-static int ibm_slot_error_detail;
-
-static int eeh_subsystem_enabled;
-
-/* Lock to avoid races due to multiple reports of an error */
-static DEFINE_SPINLOCK(confirm_error_lock);
-
-/* Buffer for reporting slot-error-detail rtas calls */
-static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
-static DEFINE_SPINLOCK(slot_errbuf_lock);
-static int eeh_error_buf_size;
-
-/* System monitoring statistics */
-static DEFINE_PER_CPU(unsigned long, no_device);
-static DEFINE_PER_CPU(unsigned long, no_dn);
-static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
-static DEFINE_PER_CPU(unsigned long, ignored_check);
-static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
-static DEFINE_PER_CPU(unsigned long, false_positives);
-static DEFINE_PER_CPU(unsigned long, ignored_failures);
-static DEFINE_PER_CPU(unsigned long, slot_resets);
-
-/**
- * The pci address cache subsystem.  This subsystem places
- * PCI device address resources into a red-black tree, sorted
- * according to the address range, so that given only an i/o
- * address, the corresponding PCI device can be **quickly**
- * found. It is safe to perform an address lookup in an interrupt
- * context; this ability is an important feature.
- *
- * Currently, the only customer of this code is the EEH subsystem;
- * thus, this code has been somewhat tailored to suit EEH better.
- * In particular, the cache does *not* hold the addresses of devices
- * for which EEH is not enabled.
- *
- * (Implementation Note: The RB tree seems to be better/faster
- * than any hash algo I could think of for this problem, even
- * with the penalty of slow pointer chases for d-cache misses).
- */
-struct pci_io_addr_range
-{
-	struct rb_node rb_node;
-	unsigned long addr_lo;
-	unsigned long addr_hi;
-	struct pci_dev *pcidev;
-	unsigned int flags;
-};
-
-static struct pci_io_addr_cache
-{
-	struct rb_root rb_root;
-	spinlock_t piar_lock;
-} pci_io_addr_cache_root;
-
-static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr)
-{
-	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
-
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (addr < piar->addr_lo) {
-			n = n->rb_left;
-		} else {
-			if (addr > piar->addr_hi) {
-				n = n->rb_right;
-			} else {
-				pci_dev_get(piar->pcidev);
-				return piar->pcidev;
-			}
-		}
-	}
-
-	return NULL;
-}
-
-/**
- * pci_get_device_by_addr - Get device, given only address
- * @addr: mmio (PIO) phys address or i/o port number
- *
- * Given an mmio phys address, or a port number, find a pci device
- * that implements this address.  Be sure to pci_dev_put the device
- * when finished.  I/O port numbers are assumed to be offset
- * from zero (that is, they do *not* have pci_io_addr added in).
- * It is safe to call this function within an interrupt.
- */
-static struct pci_dev *pci_get_device_by_addr(unsigned long addr)
-{
-	struct pci_dev *dev;
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	dev = __pci_get_device_by_addr(addr);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-	return dev;
-}
-
-#ifdef DEBUG
-/*
- * Handy-dandy debug print routine, does nothing more
- * than print out the contents of our addr cache.
- */
-static void pci_addr_cache_print(struct pci_io_addr_cache *cache)
-{
-	struct rb_node *n;
-	int cnt = 0;
-
-	n = rb_first(&cache->rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-		printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n",
-		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
-		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
-		cnt++;
-		n = rb_next(n);
-	}
-}
-#endif
-
-/* Insert address range into the rb tree. */
-static struct pci_io_addr_range *
-pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
-		      unsigned long ahi, unsigned int flags)
-{
-	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct pci_io_addr_range *piar;
-
-	/* Walk tree, find a place to insert into tree */
-	while (*p) {
-		parent = *p;
-		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
-		if (ahi < piar->addr_lo) {
-			p = &parent->rb_left;
-		} else if (alo > piar->addr_hi) {
-			p = &parent->rb_right;
-		} else {
-			if (dev != piar->pcidev ||
-			    alo != piar->addr_lo || ahi != piar->addr_hi) {
-				printk(KERN_WARNING "PIAR: overlapping address range\n");
-			}
-			return piar;
-		}
-	}
-	piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
-	if (!piar)
-		return NULL;
-
-	piar->addr_lo = alo;
-	piar->addr_hi = ahi;
-	piar->pcidev = dev;
-	piar->flags = flags;
-
-#ifdef DEBUG
-	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
-	                  alo, ahi, pci_name (dev));
-#endif
-
-	rb_link_node(&piar->rb_node, parent, p);
-	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
-
-	return piar;
-}
-
-static void __pci_addr_cache_insert_device(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	struct pci_dn *pdn;
-	int i;
-	int inserted = 0;
-
-	dn = pci_device_to_OF_node(dev);
-	if (!dn) {
-		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
-		return;
-	}
-
-	/* Skip any devices for which EEH is not enabled. */
-	pdn = PCI_DN(dn);
-	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
-	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
-#ifdef DEBUG
-		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
-		       pci_name(dev), pdn->node->full_name);
-#endif
-		return;
-	}
-
-	/* The cache holds a reference to the device... */
-	pci_dev_get(dev);
-
-	/* Walk resources on this device, poke them into the tree */
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
-		unsigned long start = pci_resource_start(dev,i);
-		unsigned long end = pci_resource_end(dev,i);
-		unsigned int flags = pci_resource_flags(dev,i);
-
-		/* We are interested only bus addresses, not dma or other stuff */
-		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
-			continue;
-		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
-			 continue;
-		pci_addr_cache_insert(dev, start, end, flags);
-		inserted = 1;
-	}
-
-	/* If there was nothing to add, the cache has no reference... */
-	if (!inserted)
-		pci_dev_put(dev);
-}
-
-/**
- * pci_addr_cache_insert_device - Add a device to the address cache
- * @dev: PCI device whose I/O addresses we are interested in.
- *
- * In order to support the fast lookup of devices based on addresses,
- * we maintain a cache of devices that can be quickly searched.
- * This routine adds a device to that cache.
- */
-static void pci_addr_cache_insert_device(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__pci_addr_cache_insert_device(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-static inline void __pci_addr_cache_remove_device(struct pci_dev *dev)
-{
-	struct rb_node *n;
-	int removed = 0;
-
-restart:
-	n = rb_first(&pci_io_addr_cache_root.rb_root);
-	while (n) {
-		struct pci_io_addr_range *piar;
-		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-
-		if (piar->pcidev == dev) {
-			rb_erase(n, &pci_io_addr_cache_root.rb_root);
-			removed = 1;
-			kfree(piar);
-			goto restart;
-		}
-		n = rb_next(n);
-	}
-
-	/* The cache no longer holds its reference to this device... */
-	if (removed)
-		pci_dev_put(dev);
-}
-
-/**
- * pci_addr_cache_remove_device - remove pci device from addr cache
- * @dev: device to remove
- *
- * Remove a device from the addr-cache tree.
- * This is potentially expensive, since it will walk
- * the tree multiple times (once per resource).
- * But so what; device removal doesn't need to be that fast.
- */
-static void pci_addr_cache_remove_device(struct pci_dev *dev)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
-	__pci_addr_cache_remove_device(dev);
-	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-}
-
-/**
- * pci_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scaned for devices (after all device resources are known).
- */
-void __init pci_addr_cache_build(void)
-{
-	struct pci_dev *dev = NULL;
-
-	if (!eeh_subsystem_enabled)
-		return;
-
-	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
-
-	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
-		/* Ignore PCI bridges ( XXX why ??) */
-		if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) {
-			continue;
-		}
-		pci_addr_cache_insert_device(dev);
-	}
-
-#ifdef DEBUG
-	/* Verify tree built up above, echo back the list of addrs. */
-	pci_addr_cache_print(&pci_io_addr_cache_root);
-#endif
-}
-
-/* --------------------------------------------------------------- */
-/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
-
-void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
-{
-	unsigned long flags;
-	int rc;
-
-	/* Log the error with the rtas logger */
-	spin_lock_irqsave(&slot_errbuf_lock, flags);
-	memset(slot_errbuf, 0, eeh_error_buf_size);
-
-	rc = rtas_call(ibm_slot_error_detail,
-	               8, 1, NULL, pdn->eeh_config_addr,
-	               BUID_HI(pdn->phb->buid),
-	               BUID_LO(pdn->phb->buid), NULL, 0,
-	               virt_to_phys(slot_errbuf),
-	               eeh_error_buf_size,
-	               severity);
-
-	if (rc == 0)
-		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
-	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
-}
-
-/**
- * eeh_register_notifier - Register to find out about EEH events.
- * @nb: notifier block to callback on events
- */
-int eeh_register_notifier(struct notifier_block *nb)
-{
-	return notifier_chain_register(&eeh_notifier_chain, nb);
-}
-
-/**
- * eeh_unregister_notifier - Unregister to an EEH event notifier.
- * @nb: notifier block to callback on events
- */
-int eeh_unregister_notifier(struct notifier_block *nb)
-{
-	return notifier_chain_unregister(&eeh_notifier_chain, nb);
-}
-
-/**
- * read_slot_reset_state - Read the reset state of a device node's slot
- * @dn: device node to read
- * @rets: array to return results in
- */
-static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
-{
-	int token, outputs;
-
-	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
-		token = ibm_read_slot_reset_state2;
-		outputs = 4;
-	} else {
-		token = ibm_read_slot_reset_state;
-		rets[2] = 0; /* fake PE Unavailable info */
-		outputs = 3;
-	}
-
-	return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr,
-			 BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid));
-}
-
-/**
- * eeh_panic - call panic() for an eeh event that cannot be handled.
- * The philosophy of this routine is that it is better to panic and
- * halt the OS than it is to risk possible data corruption by
- * oblivious device drivers that don't know better.
- *
- * @dev pci device that had an eeh event
- * @reset_state current reset state of the device slot
- */
-static void eeh_panic(struct pci_dev *dev, int reset_state)
-{
-	/*
-	 * XXX We should create a separate sysctl for this.
-	 *
-	 * Since the panic_on_oops sysctl is used to halt the system
-	 * in light of potential corruption, we can use it here.
-	 */
-	if (panic_on_oops) {
-		struct device_node *dn = pci_device_to_OF_node(dev);
-		eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
-		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
-		      pci_name(dev));
-	}
-	else {
-		__get_cpu_var(ignored_failures)++;
-		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
-		       reset_state, pci_name(dev));
-	}
-}
-
-/**
- * eeh_event_handler - dispatch EEH events.  The detection of a frozen
- * slot can occur inside an interrupt, where it can be hard to do
- * anything about it.  The goal of this routine is to pull these
- * detection events out of the context of the interrupt handler, and
- * re-dispatch them for processing at a later time in a normal context.
- *
- * @dummy - unused
- */
-static void eeh_event_handler(void *dummy)
-{
-	unsigned long flags;
-	struct eeh_event	*event;
-
-	while (1) {
-		spin_lock_irqsave(&eeh_eventlist_lock, flags);
-		event = NULL;
-		if (!list_empty(&eeh_eventlist)) {
-			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
-			list_del(&event->list);
-		}
-		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-		if (event == NULL)
-			break;
-
-		printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
-		       "%s\n", event->reset_state,
-		       pci_name(event->dev));
-
-		notifier_call_chain (&eeh_notifier_chain,
-				     EEH_NOTIFY_FREEZE, event);
-
-		pci_dev_put(event->dev);
-		kfree(event);
-	}
-}
-
-/**
- * eeh_token_to_phys - convert EEH address token to phys address
- * @token i/o token, should be address in the form 0xA....
- */
-static inline unsigned long eeh_token_to_phys(unsigned long token)
-{
-	pte_t *ptep;
-	unsigned long pa;
-
-	ptep = find_linux_pte(init_mm.pgd, token);
-	if (!ptep)
-		return token;
-	pa = pte_pfn(*ptep) << PAGE_SHIFT;
-
-	return pa | (token & (PAGE_SIZE-1));
-}
-
-/** 
- * Return the "partitionable endpoint" (pe) under which this device lies
- */
-static struct device_node * find_device_pe(struct device_node *dn)
-{
-	while ((dn->parent) && PCI_DN(dn->parent) &&
-	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
-		dn = dn->parent;
-	}
-	return dn;
-}
-
-/** Mark all devices that are peers of this device as failed.
- *  Mark the device driver too, so that it can see the failure
- *  immediately; this is critical, since some drivers poll
- *  status registers in interrupts ... If a driver is polling,
- *  and the slot is frozen, then the driver can deadlock in
- *  an interrupt context, which is bad.
- */
-
-static inline void __eeh_mark_slot (struct device_node *dn)
-{
-	while (dn) {
-		PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
-
-		if (dn->child)
-			__eeh_mark_slot (dn->child);
-		dn = dn->sibling;
-	}
-}
-
-static inline void __eeh_clear_slot (struct device_node *dn)
-{
-	while (dn) {
-		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
-		if (dn->child)
-			__eeh_clear_slot (dn->child);
-		dn = dn->sibling;
-	}
-}
-
-static inline void eeh_clear_slot (struct device_node *dn)
-{
-	unsigned long flags;
-	spin_lock_irqsave(&confirm_error_lock, flags);
-	__eeh_clear_slot (dn);
-	spin_unlock_irqrestore(&confirm_error_lock, flags);
-}
-
-/**
- * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
- * @dn device node
- * @dev pci device, if known
- *
- * Check for an EEH failure for the given device node.  Call this
- * routine if the result of a read was all 0xff's and you want to
- * find out if this is due to an EEH slot freeze.  This routine
- * will query firmware for the EEH status.
- *
- * Returns 0 if there has not been an EEH error; otherwise returns
- * a non-zero value and queues up a slot isolation event notification.
- *
- * It is safe to call this routine in an interrupt context.
- */
-int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
-{
-	int ret;
-	int rets[3];
-	unsigned long flags;
-	int reset_state;
-	struct eeh_event  *event;
-	struct pci_dn *pdn;
-	struct device_node *pe_dn;
-	int rc = 0;
-
-	__get_cpu_var(total_mmio_ffs)++;
-
-	if (!eeh_subsystem_enabled)
-		return 0;
-
-	if (!dn) {
-		__get_cpu_var(no_dn)++;
-		return 0;
-	}
-	pdn = PCI_DN(dn);
-
-	/* Access to IO BARs might get this far and still not want checking. */
-	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
-	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
-		__get_cpu_var(ignored_check)++;
-#ifdef DEBUG
-		printk ("EEH:ignored check (%x) for %s %s\n", 
-		        pdn->eeh_mode, pci_name (dev), dn->full_name);
-#endif
-		return 0;
-	}
-
-	if (!pdn->eeh_config_addr) {
-		__get_cpu_var(no_cfg_addr)++;
-		return 0;
-	}
-
-	/* If we already have a pending isolation event for this
-	 * slot, we know it's bad already, we don't need to check.
-	 * Do this checking under a lock; as multiple PCI devices
-	 * in one slot might report errors simultaneously, and we
-	 * only want one error recovery routine running.
-	 */
-	spin_lock_irqsave(&confirm_error_lock, flags);
-	rc = 1;
-	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
-		pdn->eeh_check_count ++;
-		if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
-			printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
-			        pdn->eeh_check_count);
-			dump_stack();
-			
-			/* re-read the slot reset state */
-			if (read_slot_reset_state(pdn, rets) != 0)
-				rets[0] = -1;	/* reset state unknown */
-
-			/* If we are here, then we hit an infinite loop. Stop. */
-			panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
-		}
-		goto dn_unlock;
-	}
-
-	/*
-	 * Now test for an EEH failure.  This is VERY expensive.
-	 * Note that the eeh_config_addr may be a parent device
-	 * in the case of a device behind a bridge, or it may be
-	 * function zero of a multi-function device.
-	 * In any case they must share a common PHB.
-	 */
-	ret = read_slot_reset_state(pdn, rets);
-
-	/* If the call to firmware failed, punt */
-	if (ret != 0) {
-		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
-		       ret, dn->full_name);
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	/* If EEH is not supported on this device, punt. */
-	if (rets[1] != 1) {
-		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
-		       ret, dn->full_name);
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	/* If not the kind of error we know about, punt. */
-	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	/* Note that config-io to empty slots may fail;
-	 * we recognize empty because they don't have children. */
-	if ((rets[0] == 5) && (dn->child == NULL)) {
-		__get_cpu_var(false_positives)++;
-		rc = 0;
-		goto dn_unlock;
-	}
-
-	__get_cpu_var(slot_resets)++;
- 
-	/* Avoid repeated reports of this failure, including problems
-	 * with other functions on this device, and functions under
-	 * bridges. */
-	pe_dn = find_device_pe (dn);
-	__eeh_mark_slot (pe_dn);
-	spin_unlock_irqrestore(&confirm_error_lock, flags);
-
-	reset_state = rets[0];
-
-	eeh_slot_error_detail (pdn, 1 /* Temporary Error */);
-
-	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
-	       rets[0], dn->name, dn->full_name);
-	event = kmalloc(sizeof(*event), GFP_ATOMIC);
-	if (event == NULL) {
-		eeh_panic(dev, reset_state);
-		return 1;
- 	}
-
-	event->dev = dev;
-	event->dn = dn;
-	event->reset_state = reset_state;
-
-	/* We may or may not be called in an interrupt context */
-	spin_lock_irqsave(&eeh_eventlist_lock, flags);
-	list_add(&event->list, &eeh_eventlist);
-	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-
-	/* Most EEH events are due to device driver bugs.  Having
-	 * a stack trace will help the device-driver authors figure
-	 * out what happened.  So print that out. */
-	if (rets[0] != 5) dump_stack();
-	schedule_work(&eeh_event_wq);
-
-	return 1;
-
-dn_unlock:
-	spin_unlock_irqrestore(&confirm_error_lock, flags);
-	return rc;
-}
-
-EXPORT_SYMBOL_GPL(eeh_dn_check_failure);
-
-/**
- * eeh_check_failure - check if all 1's data is due to EEH slot freeze
- * @token i/o token, should be address in the form 0xA....
- * @val value, should be all 1's (XXX why do we need this arg??)
- *
- * Check for an EEH failure at the given token address.  Call this
- * routine if the result of a read was all 0xff's and you want to
- * find out if this is due to an EEH slot freeze event.  This routine
- * will query firmware for the EEH status.
- *
- * Note this routine is safe to call in an interrupt context.
- */
-unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
-{
-	unsigned long addr;
-	struct pci_dev *dev;
-	struct device_node *dn;
-
-	/* Finding the phys addr + pci device; this is pretty quick. */
-	addr = eeh_token_to_phys((unsigned long __force) token);
-	dev = pci_get_device_by_addr(addr);
-	if (!dev) {
-		__get_cpu_var(no_device)++;
-		return val;
-	}
-
-	dn = pci_device_to_OF_node(dev);
-	eeh_dn_check_failure (dn, dev);
-
-	pci_dev_put(dev);
-	return val;
-}
-
-EXPORT_SYMBOL(eeh_check_failure);
-
-struct eeh_early_enable_info {
-	unsigned int buid_hi;
-	unsigned int buid_lo;
-};
-
-/* Enable eeh for the given device node. */
-static void *early_enable_eeh(struct device_node *dn, void *data)
-{
-	struct eeh_early_enable_info *info = data;
-	int ret;
-	char *status = get_property(dn, "status", NULL);
-	u32 *class_code = (u32 *)get_property(dn, "class-code", NULL);
-	u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL);
-	u32 *device_id = (u32 *)get_property(dn, "device-id", NULL);
-	u32 *regs;
-	int enable;
-	struct pci_dn *pdn = PCI_DN(dn);
-
-	pdn->eeh_mode = 0;
-	pdn->eeh_check_count = 0;
-	pdn->eeh_freeze_count = 0;
-
-	if (status && strcmp(status, "ok") != 0)
-		return NULL;	/* ignore devices with bad status */
-
-	/* Ignore bad nodes. */
-	if (!class_code || !vendor_id || !device_id)
-		return NULL;
-
-	/* There is nothing to check on PCI to ISA bridges */
-	if (dn->type && !strcmp(dn->type, "isa")) {
-		pdn->eeh_mode |= EEH_MODE_NOCHECK;
-		return NULL;
-	}
-
-	/*
-	 * Now decide if we are going to "Disable" EEH checking
-	 * for this device.  We still run with the EEH hardware active,
-	 * but we won't be checking for ff's.  This means a driver
-	 * could return bad data (very bad!), an interrupt handler could
-	 * hang waiting on status bits that won't change, etc.
-	 * But there are a few cases like display devices that make sense.
-	 */
-	enable = 1;	/* i.e. we will do checking */
-	if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY)
-		enable = 0;
-
-	if (!enable)
-		pdn->eeh_mode |= EEH_MODE_NOCHECK;
-
-	/* Ok... see if this device supports EEH.  Some do, some don't,
-	 * and the only way to find out is to check each and every one. */
-	regs = (u32 *)get_property(dn, "reg", NULL);
-	if (regs) {
-		/* First register entry is addr (00BBSS00)  */
-		/* Try to enable eeh */
-		ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL,
-				regs[0], info->buid_hi, info->buid_lo,
-				EEH_ENABLE);
-		if (ret == 0) {
-			eeh_subsystem_enabled = 1;
-			pdn->eeh_mode |= EEH_MODE_SUPPORTED;
-			pdn->eeh_config_addr = regs[0];
-#ifdef DEBUG
-			printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name);
-#endif
-		} else {
-
-			/* This device doesn't support EEH, but it may have an
-			 * EEH parent, in which case we mark it as supported. */
-			if (dn->parent && PCI_DN(dn->parent)
-			    && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
-				/* Parent supports EEH. */
-				pdn->eeh_mode |= EEH_MODE_SUPPORTED;
-				pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr;
-				return NULL;
-			}
-		}
-	} else {
-		printk(KERN_WARNING "EEH: %s: unable to get reg property.\n",
-		       dn->full_name);
-	}
-
-	return NULL;
-}
-
-/*
- * Initialize EEH by trying to enable it for all of the adapters in the system.
- * As a side effect we can determine here if eeh is supported at all.
- * Note that we leave EEH on so failed config cycles won't cause a machine
- * check.  If a user turns off EEH for a particular adapter they are really
- * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
- * grant access to a slot if EEH isn't enabled, and so we always enable
- * EEH for all slots/all devices.
- *
- * The eeh-force-off option disables EEH checking globally, for all slots.
- * Even if force-off is set, the EEH hardware is still enabled, so that
- * newer systems can boot.
- */
-void __init eeh_init(void)
-{
-	struct device_node *phb, *np;
-	struct eeh_early_enable_info info;
-
-	spin_lock_init(&confirm_error_lock);
-	spin_lock_init(&slot_errbuf_lock);
-
-	np = of_find_node_by_path("/rtas");
-	if (np == NULL)
-		return;
-
-	ibm_set_eeh_option = rtas_token("ibm,set-eeh-option");
-	ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
-	ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2");
-	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
-	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
-
-	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE)
-		return;
-
-	eeh_error_buf_size = rtas_token("rtas-error-log-max");
-	if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
-		eeh_error_buf_size = 1024;
-	}
-	if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) {
-		printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated "
-		      "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX);
-		eeh_error_buf_size = RTAS_ERROR_LOG_MAX;
-	}
-
-	/* Enable EEH for all adapters.  Note that eeh requires buid's */
-	for (phb = of_find_node_by_name(NULL, "pci"); phb;
-	     phb = of_find_node_by_name(phb, "pci")) {
-		unsigned long buid;
-
-		buid = get_phb_buid(phb);
-		if (buid == 0 || PCI_DN(phb) == NULL)
-			continue;
-
-		info.buid_lo = BUID_LO(buid);
-		info.buid_hi = BUID_HI(buid);
-		traverse_pci_devices(phb, early_enable_eeh, &info);
-	}
-
-	if (eeh_subsystem_enabled)
-		printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n");
-	else
-		printk(KERN_WARNING "EEH: No capable adapters found\n");
-}
-
-/**
- * eeh_add_device_early - enable EEH for the indicated device_node
- * @dn: device node for which to set up EEH
- *
- * This routine must be used to perform EEH initialization for PCI
- * devices that were added after system boot (e.g. hotplug, dlpar).
- * This routine must be called before any i/o is performed to the
- * adapter (inluding any config-space i/o).
- * Whether this actually enables EEH or not for this device depends
- * on the CEC architecture, type of the device, on earlier boot
- * command-line arguments & etc.
- */
-void eeh_add_device_early(struct device_node *dn)
-{
-	struct pci_controller *phb;
-	struct eeh_early_enable_info info;
-
-	if (!dn || !PCI_DN(dn))
-		return;
-	phb = PCI_DN(dn)->phb;
-	if (NULL == phb || 0 == phb->buid) {
-		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
-		       dn->full_name);
-		dump_stack();
-		return;
-	}
-
-	info.buid_hi = BUID_HI(phb->buid);
-	info.buid_lo = BUID_LO(phb->buid);
-	early_enable_eeh(dn, &info);
-}
-EXPORT_SYMBOL_GPL(eeh_add_device_early);
-
-/**
- * eeh_add_device_late - perform EEH initialization for the indicated pci device
- * @dev: pci device for which to set up EEH
- *
- * This routine must be used to complete EEH initialization for PCI
- * devices that were added after system boot (e.g. hotplug, dlpar).
- */
-void eeh_add_device_late(struct pci_dev *dev)
-{
-	struct device_node *dn;
-
-	if (!dev || !eeh_subsystem_enabled)
-		return;
-
-#ifdef DEBUG
-	printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev));
-#endif
-
-	pci_dev_get (dev);
-	dn = pci_device_to_OF_node(dev);
-	PCI_DN(dn)->pcidev = dev;
-
-	pci_addr_cache_insert_device (dev);
-}
-EXPORT_SYMBOL_GPL(eeh_add_device_late);
-
-/**
- * eeh_remove_device - undo EEH setup for the indicated pci device
- * @dev: pci device to be removed
- *
- * This routine should be when a device is removed from a running
- * system (e.g. by hotplug or dlpar).
- */
-void eeh_remove_device(struct pci_dev *dev)
-{
-	struct device_node *dn;
-	if (!dev || !eeh_subsystem_enabled)
-		return;
-
-	/* Unregister the device with the EEH/PCI address search system */
-#ifdef DEBUG
-	printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev));
-#endif
-	pci_addr_cache_remove_device(dev);
-
-	dn = pci_device_to_OF_node(dev);
-	PCI_DN(dn)->pcidev = NULL;
-	pci_dev_put (dev);
-}
-EXPORT_SYMBOL_GPL(eeh_remove_device);
-
-static int proc_eeh_show(struct seq_file *m, void *v)
-{
-	unsigned int cpu;
-	unsigned long ffs = 0, positives = 0, failures = 0;
-	unsigned long resets = 0;
-	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
-
-	for_each_cpu(cpu) {
-		ffs += per_cpu(total_mmio_ffs, cpu);
-		positives += per_cpu(false_positives, cpu);
-		failures += per_cpu(ignored_failures, cpu);
-		resets += per_cpu(slot_resets, cpu);
-		no_dev += per_cpu(no_device, cpu);
-		no_dn += per_cpu(no_dn, cpu);
-		no_cfg += per_cpu(no_cfg_addr, cpu);
-		no_check += per_cpu(ignored_check, cpu);
-	}
-
-	if (0 == eeh_subsystem_enabled) {
-		seq_printf(m, "EEH Subsystem is globally disabled\n");
-		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
-	} else {
-		seq_printf(m, "EEH Subsystem is enabled\n");
-		seq_printf(m,
-				"no device=%ld\n"
-				"no device node=%ld\n"
-				"no config address=%ld\n"
-				"check not wanted=%ld\n"
-				"eeh_total_mmio_ffs=%ld\n"
-				"eeh_false_positives=%ld\n"
-				"eeh_ignored_failures=%ld\n"
-				"eeh_slot_resets=%ld\n",
-				no_dev, no_dn, no_cfg, no_check,
-				ffs, positives, failures, resets);
-	}
-
-	return 0;
-}
-
-static int proc_eeh_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, proc_eeh_show, NULL);
-}
-
-static struct file_operations proc_eeh_operations = {
-	.open      = proc_eeh_open,
-	.read      = seq_read,
-	.llseek    = seq_lseek,
-	.release   = single_release,
-};
-
-static int __init eeh_init_proc(void)
-{
-	struct proc_dir_entry *e;
-
-	if (systemcfg->platform & PLATFORM_PSERIES) {
-		e = create_proc_entry("ppc64/eeh", 0, NULL);
-		if (e)
-			e->proc_fops = &proc_eeh_operations;
-	}
-
-	return 0;
-}
-__initcall(eeh_init_proc);
Index: linux-2.6.14-git3/arch/ppc64/kernel/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/ppc64/kernel/Makefile	2005-11-02 14:29:22.485829789 -0600
+++ linux-2.6.14-git3/arch/ppc64/kernel/Makefile	2005-11-02 14:30:49.805589414 -0600
@@ -35,7 +35,6 @@
 			 bpa_iic.o spider-pic.o
 
 obj-$(CONFIG_KEXEC)		+= machine_kexec.o
-obj-$(CONFIG_EEH)		+= eeh.o
 obj-$(CONFIG_PROC_FS)		+= proc_ppc64.o
 obj-$(CONFIG_RTAS_FLASH)	+= rtas_flash.o
 obj-$(CONFIG_SMP)		+= smp.o
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.14-git3.orig/arch/powerpc/platforms/pseries/Makefile	2005-10-31 11:19:47.000000000 -0600
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/Makefile	2005-11-02 14:31:36.150092654 -0600
@@ -3,3 +3,4 @@
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_IBMVIO)	+= vio.o
 obj-$(CONFIG_XICS)	+= xics.o
+obj-$(CONFIG_EEH)    += eeh.o
Index: linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.14-git3/arch/powerpc/platforms/pseries/eeh.c	2005-11-02 14:30:49.790591516 -0600
@@ -0,0 +1,1093 @@
+/*
+ * eeh.c
+ * Copyright (C) 2001 Dave Engebretsen & Todd Inglett IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/rbtree.h>
+#include <linux/seq_file.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+#include <asm/eeh.h>
+#include <asm/io.h>
+#include <asm/machdep.h>
+#include <asm/rtas.h>
+#include <asm/atomic.h>
+#include <asm/systemcfg.h>
+#include <asm/ppc-pci.h>
+
+#undef DEBUG
+
+/** Overview:
+ *  EEH, or "Extended Error Handling" is a PCI bridge technology for
+ *  dealing with PCI bus errors that can't be dealt with within the
+ *  usual PCI framework, except by check-stopping the CPU.  Systems
+ *  that are designed for high-availability/reliability cannot afford
+ *  to crash due to a "mere" PCI error, thus the need for EEH.
+ *  An EEH-capable bridge operates by converting a detected error
+ *  into a "slot freeze", taking the PCI adapter off-line, making
+ *  the slot behave, from the OS'es point of view, as if the slot
+ *  were "empty": all reads return 0xff's and all writes are silently
+ *  ignored.  EEH slot isolation events can be triggered by parity
+ *  errors on the address or data busses (e.g. during posted writes),
+ *  which in turn might be caused by low voltage on the bus, dust,
+ *  vibration, humidity, radioactivity or plain-old failed hardware.
+ *
+ *  Note, however, that one of the leading causes of EEH slot
+ *  freeze events are buggy device drivers, buggy device microcode,
+ *  or buggy device hardware.  This is because any attempt by the
+ *  device to bus-master data to a memory address that is not
+ *  assigned to the device will trigger a slot freeze.   (The idea
+ *  is to prevent devices-gone-wild from corrupting system memory).
+ *  Buggy hardware/drivers will have a miserable time co-existing
+ *  with EEH.
+ *
+ *  Ideally, a PCI device driver, when suspecting that an isolation
+ *  event has occured (e.g. by reading 0xff's), will then ask EEH
+ *  whether this is the case, and then take appropriate steps to
+ *  reset the PCI slot, the PCI device, and then resume operations.
+ *  However, until that day,  the checking is done here, with the
+ *  eeh_check_failure() routine embedded in the MMIO macros.  If
+ *  the slot is found to be isolated, an "EEH Event" is synthesized
+ *  and sent out for processing.
+ */
+
+/* EEH event workqueue setup. */
+static DEFINE_SPINLOCK(eeh_eventlist_lock);
+LIST_HEAD(eeh_eventlist);
+static void eeh_event_handler(void *);
+DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);
+
+static struct notifier_block *eeh_notifier_chain;
+
+/* If a device driver keeps reading an MMIO register in an interrupt
+ * handler after a slot isolation event has occurred, we assume it
+ * is broken and panic.  This sets the threshold for how many read
+ * attempts we allow before panicking.
+ */
+#define EEH_MAX_FAILS	100000
+
+/* RTAS tokens */
+static int ibm_set_eeh_option;
+static int ibm_set_slot_reset;
+static int ibm_read_slot_reset_state;
+static int ibm_read_slot_reset_state2;
+static int ibm_slot_error_detail;
+
+static int eeh_subsystem_enabled;
+
+/* Lock to avoid races due to multiple reports of an error */
+static DEFINE_SPINLOCK(confirm_error_lock);
+
+/* Buffer for reporting slot-error-detail rtas calls */
+static unsigned char slot_errbuf[RTAS_ERROR_LOG_MAX];
+static DEFINE_SPINLOCK(slot_errbuf_lock);
+static int eeh_error_buf_size;
+
+/* System monitoring statistics */
+static DEFINE_PER_CPU(unsigned long, no_device);
+static DEFINE_PER_CPU(unsigned long, no_dn);
+static DEFINE_PER_CPU(unsigned long, no_cfg_addr);
+static DEFINE_PER_CPU(unsigned long, ignored_check);
+static DEFINE_PER_CPU(unsigned long, total_mmio_ffs);
+static DEFINE_PER_CPU(unsigned long, false_positives);
+static DEFINE_PER_CPU(unsigned long, ignored_failures);
+static DEFINE_PER_CPU(unsigned long, slot_resets);
+
+/**
+ * The pci address cache subsystem.  This subsystem places
+ * PCI device address resources into a red-black tree, sorted
+ * according to the address range, so that given only an i/o
+ * address, the corresponding PCI device can be **quickly**
+ * found. It is safe to perform an address lookup in an interrupt
+ * context; this ability is an important feature.
+ *
+ * Currently, the only customer of this code is the EEH subsystem;
+ * thus, this code has been somewhat tailored to suit EEH better.
+ * In particular, the cache does *not* hold the addresses of devices
+ * for which EEH is not enabled.
+ *
+ * (Implementation Note: The RB tree seems to be better/faster
+ * than any hash algo I could think of for this problem, even
+ * with the penalty of slow pointer chases for d-cache misses).
+ */
+struct pci_io_addr_range
+{
+	struct rb_node rb_node;
+	unsigned long addr_lo;
+	unsigned long addr_hi;
+	struct pci_dev *pcidev;
+	unsigned int flags;
+};
+
+static struct pci_io_addr_cache
+{
+	struct rb_root rb_root;
+	spinlock_t piar_lock;
+} pci_io_addr_cache_root;
+
+static inline struct pci_dev *__pci_get_device_by_addr(unsigned long addr)
+{
+	struct rb_node *n = pci_io_addr_cache_root.rb_root.rb_node;
+
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (addr < piar->addr_lo) {
+			n = n->rb_left;
+		} else {
+			if (addr > piar->addr_hi) {
+				n = n->rb_right;
+			} else {
+				pci_dev_get(piar->pcidev);
+				return piar->pcidev;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * pci_get_device_by_addr - Get device, given only address
+ * @addr: mmio (PIO) phys address or i/o port number
+ *
+ * Given an mmio phys address, or a port number, find a pci device
+ * that implements this address.  Be sure to pci_dev_put the device
+ * when finished.  I/O port numbers are assumed to be offset
+ * from zero (that is, they do *not* have pci_io_addr added in).
+ * It is safe to call this function within an interrupt.
+ */
+static struct pci_dev *pci_get_device_by_addr(unsigned long addr)
+{
+	struct pci_dev *dev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	dev = __pci_get_device_by_addr(addr);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+	return dev;
+}
+
+#ifdef DEBUG
+/*
+ * Handy-dandy debug print routine, does nothing more
+ * than print out the contents of our addr cache.
+ */
+static void pci_addr_cache_print(struct pci_io_addr_cache *cache)
+{
+	struct rb_node *n;
+	int cnt = 0;
+
+	n = rb_first(&cache->rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+		printk(KERN_DEBUG "PCI: %s addr range %d [%lx-%lx]: %s\n",
+		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
+		       piar->addr_lo, piar->addr_hi, pci_name(piar->pcidev));
+		cnt++;
+		n = rb_next(n);
+	}
+}
+#endif
+
+/* Insert address range into the rb tree. */
+static struct pci_io_addr_range *
+pci_addr_cache_insert(struct pci_dev *dev, unsigned long alo,
+		      unsigned long ahi, unsigned int flags)
+{
+	struct rb_node **p = &pci_io_addr_cache_root.rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct pci_io_addr_range *piar;
+
+	/* Walk tree, find a place to insert into tree */
+	while (*p) {
+		parent = *p;
+		piar = rb_entry(parent, struct pci_io_addr_range, rb_node);
+		if (ahi < piar->addr_lo) {
+			p = &parent->rb_left;
+		} else if (alo > piar->addr_hi) {
+			p = &parent->rb_right;
+		} else {
+			if (dev != piar->pcidev ||
+			    alo != piar->addr_lo || ahi != piar->addr_hi) {
+				printk(KERN_WARNING "PIAR: overlapping address range\n");
+			}
+			return piar;
+		}
+	}
+	piar = (struct pci_io_addr_range *)kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
+	if (!piar)
+		return NULL;
+
+	piar->addr_lo = alo;
+	piar->addr_hi = ahi;
+	piar->pcidev = dev;
+	piar->flags = flags;
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "PIAR: insert range=[%lx:%lx] dev=%s\n",
+	                  alo, ahi, pci_name (dev));
+#endif
+
+	rb_link_node(&piar->rb_node, parent, p);
+	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
+
+	return piar;
+}
+
+static void __pci_addr_cache_insert_device(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	struct pci_dn *pdn;
+	int i;
+	int inserted = 0;
+
+	dn = pci_device_to_OF_node(dev);
+	if (!dn) {
+		printk(KERN_WARNING "PCI: no pci dn found for dev=%s\n", pci_name(dev));
+		return;
+	}
+
+	/* Skip any devices for which EEH is not enabled. */
+	pdn = PCI_DN(dn);
+	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
+	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
+#ifdef DEBUG
+		printk(KERN_INFO "PCI: skip building address cache for=%s - %s\n",
+		       pci_name(dev), pdn->node->full_name);
+#endif
+		return;
+	}
+
+	/* The cache holds a reference to the device... */
+	pci_dev_get(dev);
+
+	/* Walk resources on this device, poke them into the tree */
+	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+		unsigned long start = pci_resource_start(dev,i);
+		unsigned long end = pci_resource_end(dev,i);
+		unsigned int flags = pci_resource_flags(dev,i);
+
+		/* We are interested only bus addresses, not dma or other stuff */
+		if (0 == (flags & (IORESOURCE_IO | IORESOURCE_MEM)))
+			continue;
+		if (start == 0 || ~start == 0 || end == 0 || ~end == 0)
+			 continue;
+		pci_addr_cache_insert(dev, start, end, flags);
+		inserted = 1;
+	}
+
+	/* If there was nothing to add, the cache has no reference... */
+	if (!inserted)
+		pci_dev_put(dev);
+}
+
+/**
+ * pci_addr_cache_insert_device - Add a device to the address cache
+ * @dev: PCI device whose I/O addresses we are interested in.
+ *
+ * In order to support the fast lookup of devices based on addresses,
+ * we maintain a cache of devices that can be quickly searched.
+ * This routine adds a device to that cache.
+ */
+static void pci_addr_cache_insert_device(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__pci_addr_cache_insert_device(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+static inline void __pci_addr_cache_remove_device(struct pci_dev *dev)
+{
+	struct rb_node *n;
+	int removed = 0;
+
+restart:
+	n = rb_first(&pci_io_addr_cache_root.rb_root);
+	while (n) {
+		struct pci_io_addr_range *piar;
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		if (piar->pcidev == dev) {
+			rb_erase(n, &pci_io_addr_cache_root.rb_root);
+			removed = 1;
+			kfree(piar);
+			goto restart;
+		}
+		n = rb_next(n);
+	}
+
+	/* The cache no longer holds its reference to this device... */
+	if (removed)
+		pci_dev_put(dev);
+}
+
+/**
+ * pci_addr_cache_remove_device - remove pci device from addr cache
+ * @dev: device to remove
+ *
+ * Remove a device from the addr-cache tree.
+ * This is potentially expensive, since it will walk
+ * the tree multiple times (once per resource).
+ * But so what; device removal doesn't need to be that fast.
+ */
+static void pci_addr_cache_remove_device(struct pci_dev *dev)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
+	__pci_addr_cache_remove_device(dev);
+	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
+}
+
+/**
+ * pci_addr_cache_build - Build a cache of I/O addresses
+ *
+ * Build a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ * This routine scans all pci busses to build the cache.
+ * Must be run late in boot process, after the pci controllers
+ * have been scaned for devices (after all device resources are known).
+ */
+void __init pci_addr_cache_build(void)
+{
+	struct pci_dev *dev = NULL;
+
+	if (!eeh_subsystem_enabled)
+		return;
+
+	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+		/* Ignore PCI bridges ( XXX why ??) */
+		if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE) {
+			continue;
+		}
+		pci_addr_cache_insert_device(dev);
+	}
+
+#ifdef DEBUG
+	/* Verify tree built up above, echo back the list of addrs. */
+	pci_addr_cache_print(&pci_io_addr_cache_root);
+#endif
+}
+
+/* --------------------------------------------------------------- */
+/* Above lies the PCI Address Cache. Below lies the EEH event infrastructure */
+
+void eeh_slot_error_detail (struct pci_dn *pdn, int severity)
+{
+	unsigned long flags;
+	int rc;
+
+	/* Log the error with the rtas logger */
+	spin_lock_irqsave(&slot_errbuf_lock, flags);
+	memset(slot_errbuf, 0, eeh_error_buf_size);
+
+	rc = rtas_call(ibm_slot_error_detail,
+	               8, 1, NULL, pdn->eeh_config_addr,
+	               BUID_HI(pdn->phb->buid),
+	               BUID_LO(pdn->phb->buid), NULL, 0,
+	               virt_to_phys(slot_errbuf),
+	               eeh_error_buf_size,
+	               severity);
+
+	if (rc == 0)
+		log_error(slot_errbuf, ERR_TYPE_RTAS_LOG, 0);
+	spin_unlock_irqrestore(&slot_errbuf_lock, flags);
+}
+
+/**
+ * eeh_register_notifier - Register to find out about EEH events.
+ * @nb: notifier block to callback on events
+ */
+int eeh_register_notifier(struct notifier_block *nb)
+{
+	return notifier_chain_register(&eeh_notifier_chain, nb);
+}
+
+/**
+ * eeh_unregister_notifier - Unregister to an EEH event notifier.
+ * @nb: notifier block to callback on events
+ */
+int eeh_unregister_notifier(struct notifier_block *nb)
+{
+	return notifier_chain_unregister(&eeh_notifier_chain, nb);
+}
+
+/**
+ * read_slot_reset_state - Read the reset state of a device node's slot
+ * @dn: device node to read
+ * @rets: array to return results in
+ */
+static int read_slot_reset_state(struct pci_dn *pdn, int rets[])
+{
+	int token, outputs;
+
+	if (ibm_read_slot_reset_state2 != RTAS_UNKNOWN_SERVICE) {
+		token = ibm_read_slot_reset_state2;
+		outputs = 4;
+	} else {
+		token = ibm_read_slot_reset_state;
+		rets[2] = 0; /* fake PE Unavailable info */
+		outputs = 3;
+	}
+
+	return rtas_call(token, 3, outputs, rets, pdn->eeh_config_addr,
+			 BUID_HI(pdn->phb->buid), BUID_LO(pdn->phb->buid));
+}
+
+/**
+ * eeh_panic - call panic() for an eeh event that cannot be handled.
+ * The philosophy of this routine is that it is better to panic and
+ * halt the OS than it is to risk possible data corruption by
+ * oblivious device drivers that don't know better.
+ *
+ * @dev pci device that had an eeh event
+ * @reset_state current reset state of the device slot
+ */
+static void eeh_panic(struct pci_dev *dev, int reset_state)
+{
+	/*
+	 * XXX We should create a separate sysctl for this.
+	 *
+	 * Since the panic_on_oops sysctl is used to halt the system
+	 * in light of potential corruption, we can use it here.
+	 */
+	if (panic_on_oops) {
+		struct device_node *dn = pci_device_to_OF_node(dev);
+		eeh_slot_error_detail (PCI_DN(dn), 2 /* Permanent Error */);
+		panic("EEH: MMIO failure (%d) on device:%s\n", reset_state,
+		      pci_name(dev));
+	}
+	else {
+		__get_cpu_var(ignored_failures)++;
+		printk(KERN_INFO "EEH: Ignored MMIO failure (%d) on device:%s\n",
+		       reset_state, pci_name(dev));
+	}
+}
+
+/**
+ * eeh_event_handler - dispatch EEH events.  The detection of a frozen
+ * slot can occur inside an interrupt, where it can be hard to do
+ * anything about it.  The goal of this routine is to pull these
+ * detection events out of the context of the interrupt handler, and
+ * re-dispatch them for processing at a later time in a normal context.
+ *
+ * @dummy - unused
+ */
+static void eeh_event_handler(void *dummy)
+{
+	unsigned long flags;
+	struct eeh_event	*event;
+
+	while (1) {
+		spin_lock_irqsave(&eeh_eventlist_lock, flags);
+		event = NULL;
+		if (!list_empty(&eeh_eventlist)) {
+			event = list_entry(eeh_eventlist.next, struct eeh_event, list);
+			list_del(&event->list);
+		}
+		spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+		if (event == NULL)
+			break;
+
+		printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
+		       "%s\n", event->reset_state,
+		       pci_name(event->dev));
+
+		notifier_call_chain (&eeh_notifier_chain,
+				     EEH_NOTIFY_FREEZE, event);
+
+		pci_dev_put(event->dev);
+		kfree(event);
+	}
+}
+
+/**
+ * eeh_token_to_phys - convert EEH address token to phys address
+ * @token i/o token, should be address in the form 0xA....
+ */
+static inline unsigned long eeh_token_to_phys(unsigned long token)
+{
+	pte_t *ptep;
+	unsigned long pa;
+
+	ptep = find_linux_pte(init_mm.pgd, token);
+	if (!ptep)
+		return token;
+	pa = pte_pfn(*ptep) << PAGE_SHIFT;
+
+	return pa | (token & (PAGE_SIZE-1));
+}
+
+/** 
+ * Return the "partitionable endpoint" (pe) under which this device lies
+ */
+static struct device_node * find_device_pe(struct device_node *dn)
+{
+	while ((dn->parent) && PCI_DN(dn->parent) &&
+	      (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+		dn = dn->parent;
+	}
+	return dn;
+}
+
+/** Mark all devices that are peers of this device as failed.
+ *  Mark the device driver too, so that it can see the failure
+ *  immediately; this is critical, since some drivers poll
+ *  status registers in interrupts ... If a driver is polling,
+ *  and the slot is frozen, then the driver can deadlock in
+ *  an interrupt context, which is bad.
+ */
+
+static inline void __eeh_mark_slot (struct device_node *dn)
+{
+	while (dn) {
+		PCI_DN(dn)->eeh_mode |= EEH_MODE_ISOLATED;
+
+		if (dn->child)
+			__eeh_mark_slot (dn->child);
+		dn = dn->sibling;
+	}
+}
+
+static inline void __eeh_clear_slot (struct device_node *dn)
+{
+	while (dn) {
+		PCI_DN(dn)->eeh_mode &= ~EEH_MODE_ISOLATED;
+		if (dn->child)
+			__eeh_clear_slot (dn->child);
+		dn = dn->sibling;
+	}
+}
+
+static inline void eeh_clear_slot (struct device_node *dn)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&confirm_error_lock, flags);
+	__eeh_clear_slot (dn);
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+}
+
+/**
+ * eeh_dn_check_failure - check if all 1's data is due to EEH slot freeze
+ * @dn device node
+ * @dev pci device, if known
+ *
+ * Check for an EEH failure for the given device node.  Call this
+ * routine if the result of a read was all 0xff's and you want to
+ * find out if this is due to an EEH slot freeze.  This routine
+ * will query firmware for the EEH status.
+ *
+ * Returns 0 if there has not been an EEH error; otherwise returns
+ * a non-zero value and queues up a slot isolation event notification.
+ *
+ * It is safe to call this routine in an interrupt context.
+ */
+int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
+{
+	int ret;
+	int rets[3];
+	unsigned long flags;
+	int reset_state;
+	struct eeh_event  *event;
+	struct pci_dn *pdn;
+	struct device_node *pe_dn;
+	int rc = 0;
+
+	__get_cpu_var(total_mmio_ffs)++;
+
+	if (!eeh_subsystem_enabled)
+		return 0;
+
+	if (!dn) {
+		__get_cpu_var(no_dn)++;
+		return 0;
+	}
+	pdn = PCI_DN(dn);
+
+	/* Access to IO BARs might get this far and still not want checking. */
+	if (!(pdn->eeh_mode & EEH_MODE_SUPPORTED) ||
+	    pdn->eeh_mode & EEH_MODE_NOCHECK) {
+		__get_cpu_var(ignored_check)++;
+#ifdef DEBUG
+		printk ("EEH:ignored check (%x) for %s %s\n", 
+		        pdn->eeh_mode, pci_name (dev), dn->full_name);
+#endif
+		return 0;
+	}
+
+	if (!pdn->eeh_config_addr) {
+		__get_cpu_var(no_cfg_addr)++;
+		return 0;
+	}
+
+	/* If we already have a pending isolation event for this
+	 * slot, we know it's bad already, we don't need to check.
+	 * Do this checking under a lock; as multiple PCI devices
+	 * in one slot might report errors simultaneously, and we
+	 * only want one error recovery routine running.
+	 */
+	spin_lock_irqsave(&confirm_error_lock, flags);
+	rc = 1;
+	if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
+		pdn->eeh_check_count ++;
+		if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
+			printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
+			        pdn->eeh_check_count);
+			dump_stack();
+			
+			/* re-read the slot reset state */
+			if (read_slot_reset_state(pdn, rets) != 0)
+				rets[0] = -1;	/* reset state unknown */
+
+			/* If we are here, then we hit an infinite loop. Stop. */
+			panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
+		}
+		goto dn_unlock;
+	}
+
+	/*
+	 * Now test for an EEH failure.  This is VERY expensive.
+	 * Note that the eeh_config_addr may be a parent device
+	 * in the case of a device behind a bridge, or it may be
+	 * function zero of a multi-function device.
+	 * In any case they must share a common PHB.
+	 */
+	ret = read_slot_reset_state(pdn, rets);
+
+	/* If the call to firmware failed, punt */
+	if (ret != 0) {
+		printk(KERN_WARNING "EEH: read_slot_reset_state() failed; rc=%d dn=%s\n",
+		       ret, dn->full_name);
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	/* If EEH is not supported on this device, punt. */
+	if (rets[1] != 1) {
+		printk(KERN_WARNING "EEH: event on unsupported device, rc=%d dn=%s\n",
+		       ret, dn->full_name);
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	/* If not the kind of error we know about, punt. */
+	if (rets[0] != 2 && rets[0] != 4 && rets[0] != 5) {
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	/* Note that config-io to empty slots may fail;
+	 * we recognize empty because they don't have children. */
+	if ((rets[0] == 5) && (dn->child == NULL)) {
+		__get_cpu_var(false_positives)++;
+		rc = 0;
+		goto dn_unlock;
+	}
+
+	__get_cpu_var(slot_resets)++;
+ 
+	/* Avoid repeated reports of this failure, including problems
+	 * with other functions on this device, and functions under
+	 * bridges. */
+	pe_dn = find_device_pe (dn);
+	__eeh_mark_slot (pe_dn);
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+
+	reset_state = rets[0];
+
+	eeh_slot_error_detail (pdn, 1 /* Temporary Error */);
+
+	printk(KERN_INFO "EEH: MMIO failure (%d) on device: %s %s\n",
+	       rets[0], dn->name, dn->full_name);
+	event = kmalloc(sizeof(*event), GFP_ATOMIC);
+	if (event == NULL) {
+		eeh_panic(dev, reset_state);
+		return 1;
+ 	}
+
+	event->dev = dev;
+	event->dn = dn;
+	event->reset_state = reset_state;
+
+	/* We may or may not be called in an interrupt context */
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	list_add(&event->list, &eeh_eventlist);
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+
+	/* Most EEH events are due to device driver bugs.  Having
+	 * a stack trace will help the device-driver authors figure
+	 * out what happened.  So print that out. */
+	if (rets[0] != 5) dump_stack();
+	schedule_work(&eeh_event_wq);
+
+	return 1;
+
+dn_unlock:
+	spin_unlock_irqrestore(&confirm_error_lock, flags);
+	return rc;
+}
+
+EXPORT_SYMBOL_GPL(eeh_dn_check_failure);
+
+/**
+ * eeh_check_failure - check if all 1's data is due to EEH slot freeze
+ * @token i/o token, should be address in the form 0xA....
+ * @val value, should be all 1's (XXX why do we need this arg??)
+ *
+ * Check for an EEH failure at the given token address.  Call this
+ * routine if the result of a read was all 0xff's and you want to
+ * find out if this is due to an EEH slot freeze event.  This routine
+ * will query firmware for the EEH status.
+ *
+ * Note this routine is safe to call in an interrupt context.
+ */
+unsigned long eeh_check_failure(const volatile void __iomem *token, unsigned long val)
+{
+	unsigned long addr;
+	struct pci_dev *dev;
+	struct device_node *dn;
+
+	/* Finding the phys addr + pci device; this is pretty quick. */
+	addr = eeh_token_to_phys((unsigned long __force) token);
+	dev = pci_get_device_by_addr(addr);
+	if (!dev) {
+		__get_cpu_var(no_device)++;
+		return val;
+	}
+
+	dn = pci_device_to_OF_node(dev);
+	eeh_dn_check_failure (dn, dev);
+
+	pci_dev_put(dev);
+	return val;
+}
+
+EXPORT_SYMBOL(eeh_check_failure);
+
+struct eeh_early_enable_info {
+	unsigned int buid_hi;
+	unsigned int buid_lo;
+};
+
+/* Enable eeh for the given device node. */
+static void *early_enable_eeh(struct device_node *dn, void *data)
+{
+	struct eeh_early_enable_info *info = data;
+	int ret;
+	char *status = get_property(dn, "status", NULL);
+	u32 *class_code = (u32 *)get_property(dn, "class-code", NULL);
+	u32 *vendor_id = (u32 *)get_property(dn, "vendor-id", NULL);
+	u32 *device_id = (u32 *)get_property(dn, "device-id", NULL);
+	u32 *regs;
+	int enable;
+	struct pci_dn *pdn = PCI_DN(dn);
+
+	pdn->eeh_mode = 0;
+	pdn->eeh_check_count = 0;
+	pdn->eeh_freeze_count = 0;
+
+	if (status && strcmp(status, "ok") != 0)
+		return NULL;	/* ignore devices with bad status */
+
+	/* Ignore bad nodes. */
+	if (!class_code || !vendor_id || !device_id)
+		return NULL;
+
+	/* There is nothing to check on PCI to ISA bridges */
+	if (dn->type && !strcmp(dn->type, "isa")) {
+		pdn->eeh_mode |= EEH_MODE_NOCHECK;
+		return NULL;
+	}
+
+	/*
+	 * Now decide if we are going to "Disable" EEH checking
+	 * for this device.  We still run with the EEH hardware active,
+	 * but we won't be checking for ff's.  This means a driver
+	 * could return bad data (very bad!), an interrupt handler could
+	 * hang waiting on status bits that won't change, etc.
+	 * But there are a few cases like display devices that make sense.
+	 */
+	enable = 1;	/* i.e. we will do checking */
+	if ((*class_code >> 16) == PCI_BASE_CLASS_DISPLAY)
+		enable = 0;
+
+	if (!enable)
+		pdn->eeh_mode |= EEH_MODE_NOCHECK;
+
+	/* Ok... see if this device supports EEH.  Some do, some don't,
+	 * and the only way to find out is to check each and every one. */
+	regs = (u32 *)get_property(dn, "reg", NULL);
+	if (regs) {
+		/* First register entry is addr (00BBSS00)  */
+		/* Try to enable eeh */
+		ret = rtas_call(ibm_set_eeh_option, 4, 1, NULL,
+				regs[0], info->buid_hi, info->buid_lo,
+				EEH_ENABLE);
+		if (ret == 0) {
+			eeh_subsystem_enabled = 1;
+			pdn->eeh_mode |= EEH_MODE_SUPPORTED;
+			pdn->eeh_config_addr = regs[0];
+#ifdef DEBUG
+			printk(KERN_DEBUG "EEH: %s: eeh enabled\n", dn->full_name);
+#endif
+		} else {
+
+			/* This device doesn't support EEH, but it may have an
+			 * EEH parent, in which case we mark it as supported. */
+			if (dn->parent && PCI_DN(dn->parent)
+			    && (PCI_DN(dn->parent)->eeh_mode & EEH_MODE_SUPPORTED)) {
+				/* Parent supports EEH. */
+				pdn->eeh_mode |= EEH_MODE_SUPPORTED;
+				pdn->eeh_config_addr = PCI_DN(dn->parent)->eeh_config_addr;
+				return NULL;
+			}
+		}
+	} else {
+		printk(KERN_WARNING "EEH: %s: unable to get reg property.\n",
+		       dn->full_name);
+	}
+
+	return NULL;
+}
+
+/*
+ * Initialize EEH by trying to enable it for all of the adapters in the system.
+ * As a side effect we can determine here if eeh is supported at all.
+ * Note that we leave EEH on so failed config cycles won't cause a machine
+ * check.  If a user turns off EEH for a particular adapter they are really
+ * telling Linux to ignore errors.  Some hardware (e.g. POWER5) won't
+ * grant access to a slot if EEH isn't enabled, and so we always enable
+ * EEH for all slots/all devices.
+ *
+ * The eeh-force-off option disables EEH checking globally, for all slots.
+ * Even if force-off is set, the EEH hardware is still enabled, so that
+ * newer systems can boot.
+ */
+void __init eeh_init(void)
+{
+	struct device_node *phb, *np;
+	struct eeh_early_enable_info info;
+
+	spin_lock_init(&confirm_error_lock);
+	spin_lock_init(&slot_errbuf_lock);
+
+	np = of_find_node_by_path("/rtas");
+	if (np == NULL)
+		return;
+
+	ibm_set_eeh_option = rtas_token("ibm,set-eeh-option");
+	ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
+	ibm_read_slot_reset_state2 = rtas_token("ibm,read-slot-reset-state2");
+	ibm_read_slot_reset_state = rtas_token("ibm,read-slot-reset-state");
+	ibm_slot_error_detail = rtas_token("ibm,slot-error-detail");
+
+	if (ibm_set_eeh_option == RTAS_UNKNOWN_SERVICE)
+		return;
+
+	eeh_error_buf_size = rtas_token("rtas-error-log-max");
+	if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
+		eeh_error_buf_size = 1024;
+	}
+	if (eeh_error_buf_size > RTAS_ERROR_LOG_MAX) {
+		printk(KERN_WARNING "EEH: rtas-error-log-max is bigger than allocated "
+		      "buffer ! (%d vs %d)", eeh_error_buf_size, RTAS_ERROR_LOG_MAX);
+		eeh_error_buf_size = RTAS_ERROR_LOG_MAX;
+	}
+
+	/* Enable EEH for all adapters.  Note that eeh requires buid's */
+	for (phb = of_find_node_by_name(NULL, "pci"); phb;
+	     phb = of_find_node_by_name(phb, "pci")) {
+		unsigned long buid;
+
+		buid = get_phb_buid(phb);
+		if (buid == 0 || PCI_DN(phb) == NULL)
+			continue;
+
+		info.buid_lo = BUID_LO(buid);
+		info.buid_hi = BUID_HI(buid);
+		traverse_pci_devices(phb, early_enable_eeh, &info);
+	}
+
+	if (eeh_subsystem_enabled)
+		printk(KERN_INFO "EEH: PCI Enhanced I/O Error Handling Enabled\n");
+	else
+		printk(KERN_WARNING "EEH: No capable adapters found\n");
+}
+
+/**
+ * eeh_add_device_early - enable EEH for the indicated device_node
+ * @dn: device node for which to set up EEH
+ *
+ * This routine must be used to perform EEH initialization for PCI
+ * devices that were added after system boot (e.g. hotplug, dlpar).
+ * This routine must be called before any i/o is performed to the
+ * adapter (inluding any config-space i/o).
+ * Whether this actually enables EEH or not for this device depends
+ * on the CEC architecture, type of the device, on earlier boot
+ * command-line arguments & etc.
+ */
+void eeh_add_device_early(struct device_node *dn)
+{
+	struct pci_controller *phb;
+	struct eeh_early_enable_info info;
+
+	if (!dn || !PCI_DN(dn))
+		return;
+	phb = PCI_DN(dn)->phb;
+	if (NULL == phb || 0 == phb->buid) {
+		printk(KERN_WARNING "EEH: Expected buid but found none for %s\n",
+		       dn->full_name);
+		dump_stack();
+		return;
+	}
+
+	info.buid_hi = BUID_HI(phb->buid);
+	info.buid_lo = BUID_LO(phb->buid);
+	early_enable_eeh(dn, &info);
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_early);
+
+/**
+ * eeh_add_device_late - perform EEH initialization for the indicated pci device
+ * @dev: pci device for which to set up EEH
+ *
+ * This routine must be used to complete EEH initialization for PCI
+ * devices that were added after system boot (e.g. hotplug, dlpar).
+ */
+void eeh_add_device_late(struct pci_dev *dev)
+{
+	struct device_node *dn;
+
+	if (!dev || !eeh_subsystem_enabled)
+		return;
+
+#ifdef DEBUG
+	printk(KERN_DEBUG "EEH: adding device %s\n", pci_name(dev));
+#endif
+
+	pci_dev_get (dev);
+	dn = pci_device_to_OF_node(dev);
+	PCI_DN(dn)->pcidev = dev;
+
+	pci_addr_cache_insert_device (dev);
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_late);
+
+/**
+ * eeh_remove_device - undo EEH setup for the indicated pci device
+ * @dev: pci device to be removed
+ *
+ * This routine should be when a device is removed from a running
+ * system (e.g. by hotplug or dlpar).
+ */
+void eeh_remove_device(struct pci_dev *dev)
+{
+	struct device_node *dn;
+	if (!dev || !eeh_subsystem_enabled)
+		return;
+
+	/* Unregister the device with the EEH/PCI address search system */
+#ifdef DEBUG
+	printk(KERN_DEBUG "EEH: remove device %s\n", pci_name(dev));
+#endif
+	pci_addr_cache_remove_device(dev);
+
+	dn = pci_device_to_OF_node(dev);
+	PCI_DN(dn)->pcidev = NULL;
+	pci_dev_put (dev);
+}
+EXPORT_SYMBOL_GPL(eeh_remove_device);
+
+static int proc_eeh_show(struct seq_file *m, void *v)
+{
+	unsigned int cpu;
+	unsigned long ffs = 0, positives = 0, failures = 0;
+	unsigned long resets = 0;
+	unsigned long no_dev = 0, no_dn = 0, no_cfg = 0, no_check = 0;
+
+	for_each_cpu(cpu) {
+		ffs += per_cpu(total_mmio_ffs, cpu);
+		positives += per_cpu(false_positives, cpu);
+		failures += per_cpu(ignored_failures, cpu);
+		resets += per_cpu(slot_resets, cpu);
+		no_dev += per_cpu(no_device, cpu);
+		no_dn += per_cpu(no_dn, cpu);
+		no_cfg += per_cpu(no_cfg_addr, cpu);
+		no_check += per_cpu(ignored_check, cpu);
+	}
+
+	if (0 == eeh_subsystem_enabled) {
+		seq_printf(m, "EEH Subsystem is globally disabled\n");
+		seq_printf(m, "eeh_total_mmio_ffs=%ld\n", ffs);
+	} else {
+		seq_printf(m, "EEH Subsystem is enabled\n");
+		seq_printf(m,
+				"no device=%ld\n"
+				"no device node=%ld\n"
+				"no config address=%ld\n"
+				"check not wanted=%ld\n"
+				"eeh_total_mmio_ffs=%ld\n"
+				"eeh_false_positives=%ld\n"
+				"eeh_ignored_failures=%ld\n"
+				"eeh_slot_resets=%ld\n",
+				no_dev, no_dn, no_cfg, no_check,
+				ffs, positives, failures, resets);
+	}
+
+	return 0;
+}
+
+static int proc_eeh_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_eeh_show, NULL);
+}
+
+static struct file_operations proc_eeh_operations = {
+	.open      = proc_eeh_open,
+	.read      = seq_read,
+	.llseek    = seq_lseek,
+	.release   = single_release,
+};
+
+static int __init eeh_init_proc(void)
+{
+	struct proc_dir_entry *e;
+
+	if (systemcfg->platform & PLATFORM_PSERIES) {
+		e = create_proc_entry("ppc64/eeh", 0, NULL);
+		if (e)
+			e->proc_fops = &proc_eeh_operations;
+	}
+
+	return 0;
+}
+__initcall(eeh_init_proc);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver
  2005-11-04  0:53 ` [PATCH 29/42]: ethernet: add PCI error recovery to e100 " Linas Vepstas
@ 2005-11-04  1:34   ` Jesse Brandeburg
  2005-11-04  1:51     ` Jesse Brandeburg
  0 siblings, 1 reply; 131+ messages in thread
From: Jesse Brandeburg @ 2005-11-04  1:34 UTC (permalink / raw)
  To: linas
  Cc: paulus, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On 11/3/05, Linas Vepstas <linas@linas.org> wrote:
> Various PCI bus errors can be signaled by newer PCI controllers.  This
> patch adds the PCI error recovery callbacks to the intel ethernet e100
> device driver. The patch has been tested, and appears to work well.
>
> Signed-off-by: Linas Vepstas <linas@linas.org>
>
> --
> Index: linux-2.6.14-git3/drivers/net/e100.c

I think these patches will be great, on the pseries, but
is there not a compile option that should compile out all this code, i.e.
#ifdef PCI_ERROR_RECOVERY

if the arch doesn't support it?

Jesse

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver
  2005-11-04  1:34   ` Jesse Brandeburg
@ 2005-11-04  1:51     ` Jesse Brandeburg
  0 siblings, 0 replies; 131+ messages in thread
From: Jesse Brandeburg @ 2005-11-04  1:51 UTC (permalink / raw)
  To: linas
  Cc: paulus, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On 11/3/05, Jesse Brandeburg <jesse.brandeburg@gmail.com> wrote:
> I think these patches will be great, on the pseries, but
> is there not a compile option that should compile out all this code, i.e.
> #ifdef PCI_ERROR_RECOVERY
>
> if the arch doesn't support it?

Uh, i just saw patch 32, never mind.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 19/42]: ppc64: bugfix: crash on PHB add
  2005-11-04  0:51 ` [PATCH 19/42]: ppc64: bugfix: crash on PHB add Linas Vepstas
@ 2005-11-04 16:20   ` John Rose
  2005-11-04 16:35     ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: John Rose @ 2005-11-04 16:20 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Paul Mackerras, External List, linux-pci, bluesmoke-devel, lkml

> This patch fixes a bug related to dlpar PHB add, after a PHB removal.

This and patch 18 seem logically separate from the feature.  This
complicates review and adds to an already large patch set.  Could we
handle these separately?

Thanks-
John


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 19/42]: ppc64: bugfix: crash on PHB add
  2005-11-04 16:20   ` John Rose
@ 2005-11-04 16:35     ` linas
  0 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-04 16:35 UTC (permalink / raw)
  To: John Rose; +Cc: Paul Mackerras, External List, linux-pci, bluesmoke-devel, lkml

On Fri, Nov 04, 2005 at 10:20:55AM -0600, John Rose was heard to remark:
> > This patch fixes a bug related to dlpar PHB add, after a PHB removal.
> 
> This and patch 18 seem logically separate from the feature.  This
> complicates review and adds to an already large patch set.  Could we
> handle these separately?

I sent these in separetely, a month ago, as bug fixes for the dlpar 
crashes in the pre-2.6.14 kernels, but these were never applied.  
Since they're needed to get EEH to work, I just sent them in again 
with this set.  Yes, I'm aware that the patch you sent yesterday
fixes the same bug in almost the same way. 

What you really want to concentrate on are patches 20 through 23
which mess with the guts of the rpaphp code. But again, these are 
the same old patches, they have not changed since the submit last 
month.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 22/42]: PCI: remove duplicted pci hotplug code
  2005-11-04  0:52 ` [PATCH 22/42]: PCI: remove duplicted pci " Linas Vepstas
@ 2005-11-04 21:54   ` John Rose
  0 siblings, 0 replies; 131+ messages in thread
From: John Rose @ 2005-11-04 21:54 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Paul Mackerras, External List, linux-pci, bluesmoke-devel, lkml

> +extern void pcibios_claim_one_bus(struct pci_bus *b);
> +

Might need to export this for module use by the kernel.

Thanks-
John


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence
  2005-11-04  0:55 ` [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence Linas Vepstas
@ 2005-11-04 22:14   ` linas
  0 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-04 22:14 UTC (permalink / raw)
  To: paulus, linuxppc64-dev; +Cc: johnrose, linux-pci, bluesmoke-devel, linux-kernel

Hi,

On Thu, Nov 03, 2005 at 06:55:19PM -0600, Linas Vepstas was heard to remark:
> Save the PCI device bars *before* any PCI probing is done. 

After a tiny bit of extra testing, I found one of those forehead
slapping bugs in this patch.  Here it is again, with the bug fixed.

(So far, all the other tests look good; I've survived 24 hours of 
several thousand artifical pci errors injected onto ethernet and scsi
intermingled with hundreds of pci slot adds/removes.)

------------

241-eeh-save-bars-earlier.patch

Save the PCI device bars *before* any PCI probing is done. 

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-git6/arch/ppc64/kernel/rtas_pci.c
===================================================================
--- linux-2.6.14-git6.orig/arch/ppc64/kernel/rtas_pci.c	2005-11-03 14:46:40.000000000 -0600
+++ linux-2.6.14-git6/arch/ppc64/kernel/rtas_pci.c	2005-11-03 14:50:22.000000000 -0600
@@ -72,7 +72,7 @@
         return 0;
 }
 
-static int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
+int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
 {
 	int returnval = -1;
 	unsigned long buid, addr;
Index: linux-2.6.14-git6/include/asm-powerpc/ppc-pci.h
===================================================================
--- linux-2.6.14-git6.orig/include/asm-powerpc/ppc-pci.h	2005-11-03 14:50:21.000000000 -0600
+++ linux-2.6.14-git6/include/asm-powerpc/ppc-pci.h	2005-11-03 14:50:22.000000000 -0600
@@ -59,8 +59,6 @@
 void pci_addr_cache_build(void);
 struct pci_dev *pci_get_device_by_addr(unsigned long addr);
 
-void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn);
-
 /**
  * eeh_slot_error_detail -- record and EEH error condition to the log
  * @severity: 1 if temporary, 2 if permanent failure.
@@ -104,6 +102,7 @@
 void rtas_configure_bridge(struct pci_dn *);
 
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
+int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
 
 /**
  * mark and clear slots: find "partition endpoint" PE and set or 
Index: linux-2.6.14-git6/include/asm-ppc64/pci-bridge.h
===================================================================
--- linux-2.6.14-git6.orig/include/asm-ppc64/pci-bridge.h	2005-11-03 14:50:15.000000000 -0600
+++ linux-2.6.14-git6/include/asm-ppc64/pci-bridge.h	2005-11-03 14:50:22.000000000 -0600
@@ -58,15 +58,15 @@
 struct iommu_table;
 
 struct pci_dn {
-	int	busno;			/* for pci devices */
-	int	bussubno;		/* for pci devices */
-	int	devfn;			/* for pci devices */
+	int	busno;			/* pci bus number */
+	int	bussubno;		/* pci subordinate bus number */
+	int	devfn;			/* pci device and function number */
+	int	class_code;		/* pci device class */
 	int	eeh_mode;		/* See eeh.h for possible EEH_MODEs */
 	int	eeh_config_addr;
 	int	eeh_pe_config_addr; /* new-style partition endpoint address */
 	int 	eeh_check_count;	/* # times driver ignored error */
 	int 	eeh_freeze_count;	/* # times this device froze up. */
-	int	eeh_is_bridge;		/* device is pci-to-pci bridge */
 
 	int	pci_ext_config_space;	/* for pci devices */
 	struct  pci_controller *phb;	/* for pci devices */
Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh.c
===================================================================
--- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/eeh.c	2005-11-03 14:50:21.000000000 -0600
+++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh.c	2005-11-04 16:07:29.596059751 -0600
@@ -106,6 +106,8 @@
 static DEFINE_PER_CPU(unsigned long, ignored_failures);
 static DEFINE_PER_CPU(unsigned long, slot_resets);
 
+#define IS_BRIDGE(class_code) (((class_code)<<16) == PCI_BASE_CLASS_BRIDGE)
+
 /* --------------------------------------------------------------- */
 /* Below lies the EEH event infrastructure */
 
@@ -620,7 +622,7 @@
 	if (!pdn) 
 		return;
 	
-	if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && (!pdn->eeh_is_bridge))
+	if ((pdn->eeh_mode & EEH_MODE_SUPPORTED) && !IS_BRIDGE(pdn->class_code))
 		__restore_bars (pdn);
 
 	dn = pdn->node->child;
@@ -638,18 +640,15 @@
  * PCI devices are added individuallly; but, for the restore,
  * an entire slot is reset at a time.
  */
-void eeh_save_bars(struct pci_dev * pdev, struct pci_dn *pdn)
+static void eeh_save_bars(struct pci_dn *pdn)
 {
 	int i;
 
-	if (!pdev || !pdn )
+	if (!pdn )
 		return;
 	
 	for (i = 0; i < 16; i++)
-		pci_read_config_dword(pdev, i * 4, &pdn->config_space[i]);
-
-	if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
-		pdn->eeh_is_bridge = 1;
+		rtas_read_config(pdn, i * 4, 4, &pdn->config_space[i]);
 }
 
 void
@@ -703,6 +702,9 @@
 	pdn->eeh_check_count = 0;
 	pdn->eeh_freeze_count = 0;
 
+	if (class_code)
+		pdn->class_code = *class_code;
+	
 	if (status && strcmp(status, "ok") != 0)
 		return NULL;	/* ignore devices with bad status */
 
@@ -781,6 +783,7 @@
 		       dn->full_name);
 	}
 
+	eeh_save_bars(pdn);
 	return NULL;
 }
 
@@ -915,7 +918,6 @@
 	pdn->pcidev = dev;
 
 	pci_addr_cache_insert_device (dev);
-	eeh_save_bars(dev, pdn);
 }
 EXPORT_SYMBOL_GPL(eeh_add_device_late);
 
Index: linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh_cache.c
===================================================================
--- linux-2.6.14-git6.orig/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-03 14:50:19.000000000 -0600
+++ linux-2.6.14-git6/arch/powerpc/platforms/pseries/eeh_cache.c	2005-11-04 10:22:51.000000000 -0600
@@ -304,10 +304,7 @@
 
 		pci_addr_cache_insert_device(dev);
 
-		/* Save the BAR's; firmware doesn't restore these after EEH reset */
 		dn = pci_device_to_OF_node(dev);
-		eeh_save_bars(dev, PCI_DN(dn));
-
 		pci_dev_get (dev);  /* matching put is in eeh_remove_device() */
 		PCI_DN(dn)->pcidev = dev;
 	}



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers
  2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
                   ` (42 preceding siblings ...)
  2005-11-04  0:57 ` [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 Linas Vepstas
@ 2005-11-04 22:14 ` Greg KH
  2005-11-05  0:08   ` Paul Mackerras
  43 siblings, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-04 22:14 UTC (permalink / raw)
  To: linas
  Cc: paulus, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On Thu, Nov 03, 2005 at 05:59:18PM -0600, Linas Vepstas wrote:
> What follows is a long sequence of mostly small patches to implement
> PCI Error Recovery by adding notification callbacks to the PCI device
> driver structure, implementing the recovery in 5 device drivers
> (3 ethernet, two scsi drivers), and adding the actual error detection
> and recovery code to the ppc64/powerpc arch tree.
> 
> Highlights:
> 
> -- Patches 1-14: Misc required ppc64/powerpc cleanup/bugfixes/restructuring
> -- Patch 15: Overview documentation
> -- Patch 16: changes to include/linux/pci.h
> -- Patches 17-26: error detection and recovery for pSeries PCI bridge chips
> -- Patchs 27-32: recovery patches for ethernet, scsi device drivers
> -- Patches 33-42: More misc ppc64-specific changes

Ok, so at first glance, I only need to pay attention to patches 15, 16,
and 27-32?  If so, please send the ppc64 specific patches through the
ppc64 maintainers, and the rpaphp specific patches through that specific
maintainer.  Then care to resend the 8 remaining patches to me, so I can
stage them in -mm for a while?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers
  2005-11-04 22:14 ` [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Greg KH
@ 2005-11-05  0:08   ` Paul Mackerras
  2005-11-05  0:28     ` Greg KH
  0 siblings, 1 reply; 131+ messages in thread
From: Paul Mackerras @ 2005-11-05  0:08 UTC (permalink / raw)
  To: Greg KH
  Cc: linas, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

Greg KH writes:

> Ok, so at first glance, I only need to pay attention to patches 15, 16,
> and 27-32?  If so, please send the ppc64 specific patches through the
> ppc64 maintainers, and the rpaphp specific patches through that specific
> maintainer.  Then care to resend the 8 remaining patches to me, so I can
> stage them in -mm for a while?

I'm happy to take care of the ppc64-specific patches.

I would *really* like to see 16 go to Linus as soon as possible, since
everything else depends on it, and since it has very little chance of
breaking any existing code.  Would you be OK with sending 16 to Linus
within the next week?

Thanks,
Paul.


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers
  2005-11-05  0:08   ` Paul Mackerras
@ 2005-11-05  0:28     ` Greg KH
  2005-11-05  0:46       ` Paul Mackerras
  0 siblings, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-05  0:28 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linas, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On Sat, Nov 05, 2005 at 11:08:17AM +1100, Paul Mackerras wrote:
> Greg KH writes:
> 
> > Ok, so at first glance, I only need to pay attention to patches 15, 16,
> > and 27-32?  If so, please send the ppc64 specific patches through the
> > ppc64 maintainers, and the rpaphp specific patches through that specific
> > maintainer.  Then care to resend the 8 remaining patches to me, so I can
> > stage them in -mm for a while?
> 
> I'm happy to take care of the ppc64-specific patches.
> 
> I would *really* like to see 16 go to Linus as soon as possible, since
> everything else depends on it, and since it has very little chance of
> breaking any existing code.  Would you be OK with sending 16 to Linus
> within the next week?

Can I take 15, 16, 27-32 now without the ppc64 patches dying without it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers
  2005-11-05  0:28     ` Greg KH
@ 2005-11-05  0:46       ` Paul Mackerras
  2005-11-05  1:28         ` Greg KH
  0 siblings, 1 reply; 131+ messages in thread
From: Paul Mackerras @ 2005-11-05  0:46 UTC (permalink / raw)
  To: Greg KH
  Cc: linas, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

Greg KH writes:

> Can I take 15, 16, 27-32 now without the ppc64 patches dying without it?

Sorry, I'm having trouble parsing that.  If you mean, will it break
ppc64 if you send those patches to Linus before the ppc64 bits get in,
the answer is no it won't, please send them on.

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers
  2005-11-05  0:46       ` Paul Mackerras
@ 2005-11-05  1:28         ` Greg KH
  0 siblings, 0 replies; 131+ messages in thread
From: Greg KH @ 2005-11-05  1:28 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linas, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On Sat, Nov 05, 2005 at 11:46:44AM +1100, Paul Mackerras wrote:
> Greg KH writes:
> 
> > Can I take 15, 16, 27-32 now without the ppc64 patches dying without it?
> 
> Sorry, I'm having trouble parsing that.  If you mean, will it break
> ppc64 if you send those patches to Linus before the ppc64 bits get in,
> the answer is no it won't, please send them on.

Ok, I'll go look at them and see if they can be added to my tree for
testing in -mm before I'll send them to Linus.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-04  0:50 ` [PATCH 16/42]: PCI: PCI Error reporting callbacks Linas Vepstas
@ 2005-11-05  6:11   ` Greg KH
  2005-11-06 23:25     ` Paul Mackerras
  0 siblings, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-05  6:11 UTC (permalink / raw)
  To: linas
  Cc: paulus, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On Thu, Nov 03, 2005 at 06:50:35PM -0600, Linas Vepstas wrote:
> +/* ---------------------------------------------------------------- */
> +/** PCI error recovery infrastructure.  If a PCI device driver provides
> + *  a set fof callbacks in struct pci_error_handlers, then that device driver
> + *  will be notified of PCI bus errors, and will be driven to recovery
> + *  when an error occurs.
> + */
> +
> +enum pcierr_result {
> +	PCIERR_RESULT_NONE=0,        /* no result/none/not supported in device driver */
> +	PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> +	PCIERR_RESULT_NEED_RESET,    /* Device driver wants slot to be reset. */
> +	PCIERR_RESULT_DISCONNECT,    /* Device has completely failed, is unrecoverable */
> +	PCIERR_RESULT_RECOVERED,     /* Device driver is fully recovered and operational */
> +};

No, do not create new types of error or return codes.  Use the standard
-EFOO values.  You can document what they should each return, and mean,
but do not create new codes.

Also, you create an enum, but yet do not use it in your function
callback definition, which means you really didn't want to create it in
the first place...

I'll add 15 and 16 to my tree for now, so they will show up in -mm, but
I want to see updated versions before sending them off to Linus.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-05  6:11   ` Greg KH
@ 2005-11-06 23:25     ` Paul Mackerras
  2005-11-07 17:55       ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Paul Mackerras @ 2005-11-06 23:25 UTC (permalink / raw)
  To: Greg KH
  Cc: linas, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

Greg KH writes:

> > +enum pcierr_result {
> > +	PCIERR_RESULT_NONE=0,        /* no result/none/not supported in device driver */
> > +	PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> > +	PCIERR_RESULT_NEED_RESET,    /* Device driver wants slot to be reset. */
> > +	PCIERR_RESULT_DISCONNECT,    /* Device has completely failed, is unrecoverable */
> > +	PCIERR_RESULT_RECOVERED,     /* Device driver is fully recovered and operational */
> > +};
> 
> No, do not create new types of error or return codes.  Use the standard
> -EFOO values.  You can document what they should each return, and mean,
> but do not create new codes.

Actually, these are not error or return codes, but rather requested
actions (maybe somewhat misnamed).  We can map them on to -EFOO values
but it will be rather strained (-ECONNRESET for "please reset the
slot", anyone? :).

> Also, you create an enum, but yet do not use it in your function
> callback definition, which means you really didn't want to create it in
> the first place...

Yes, they could be #defines.

Paul.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-06 23:25     ` Paul Mackerras
@ 2005-11-07 17:55       ` linas
  2005-11-07 18:27         ` Greg KH
  0 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-07 17:55 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Greg KH, linuxppc64-dev, johnrose, linux-pci, bluesmoke-devel,
	linux-kernel

On Mon, Nov 07, 2005 at 10:25:39AM +1100, Paul Mackerras was heard to remark:
> Greg KH writes:
> 
> > > +enum pcierr_result {
> > > +	PCIERR_RESULT_NONE=0,        /* no result/none/not supported in device driver */
> > > +	PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> > > +	PCIERR_RESULT_NEED_RESET,    /* Device driver wants slot to be reset. */
> > > +	PCIERR_RESULT_DISCONNECT,    /* Device has completely failed, is unrecoverable */
> > > +	PCIERR_RESULT_RECOVERED,     /* Device driver is fully recovered and operational */
> > > +};
> > 
> > No, do not create new types of error or return codes.  Use the standard
> > -EFOO values.  You can document what they should each return, and mean,
> > but do not create new codes.
> 
> Actually, these are not error or return codes, but rather requested
> actions 

Yes. 

> (maybe somewhat misnamed).  

As to naming, my mind went blank on coming up with a good name,
and the results was a poor name.

I now note that "EDAC" ("Error Detection ad Correction") is now taken.

How about "PECS" ("PCI Error Correction System") ? 

I guess "PCI Error Detection And Recovery System" (PEDERAST) might
have an inappropriate set of connotations.

> We can map them on to -EFOO values
> but it will be rather strained (-ECONNRESET for "please reset the
> slot", anyone? :).

Yes, that would only lead to confusion.

> > Also, you create an enum, but yet do not use it in your function
> > callback definition, which means you really didn't want to create it in
> > the first place...
> 
> Yes, they could be #defines.

In one incarnation, they were #defines.  The enum was supposed to be 
the return value of the error notification callbacks.  

I can prepare a new patch: would you prefer:

1) lose typing: #defines and int return value?

2) strong typing: enum and enum return value?

I often prefer strong typing.

And do you want a patch now, or later?

--linas


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 17:55       ` linas
@ 2005-11-07 18:27         ` Greg KH
  2005-11-07 18:56           ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] linas
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
  0 siblings, 2 replies; 131+ messages in thread
From: Greg KH @ 2005-11-07 18:27 UTC (permalink / raw)
  To: linas
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 11:55:41AM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:25:39AM +1100, Paul Mackerras was heard to remark:
> > Greg KH writes:
> > 
> > > > +enum pcierr_result {
> > > > +	PCIERR_RESULT_NONE=0,        /* no result/none/not supported in device driver */
> > > > +	PCIERR_RESULT_CAN_RECOVER=1, /* Device driver can recover without slot reset */
> > > > +	PCIERR_RESULT_NEED_RESET,    /* Device driver wants slot to be reset. */
> > > > +	PCIERR_RESULT_DISCONNECT,    /* Device has completely failed, is unrecoverable */
> > > > +	PCIERR_RESULT_RECOVERED,     /* Device driver is fully recovered and operational */
> > > > +};
> > > 
> > > No, do not create new types of error or return codes.  Use the standard
> > > -EFOO values.  You can document what they should each return, and mean,
> > > but do not create new codes.
> > 
> > Actually, these are not error or return codes, but rather requested
> > actions 
> 
> Yes. 

Ok, then make them be stronger, and not return an int, as everyone will
get that wrong.

> In one incarnation, they were #defines.  The enum was supposed to be 
> the return value of the error notification callbacks.  
> 
> I can prepare a new patch: would you prefer:
> 
> 1) lose typing: #defines and int return value?
> 
> 2) strong typing: enum and enum return value?

3) realy strong typing that sparse can detect.

enums don't really work, as you can get away with using an integer and
the compiler will never complain.  Please use a typedef (yeah, I said
typedef) in the way that sparse will catch any bad users of the code.

> I often prefer strong typing.
> 
> And do you want a patch now, or later?

Depends on when you want to see this make it into mainline :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 18:27         ` Greg KH
@ 2005-11-07 18:56           ` linas
  2005-11-07 19:02             ` Greg KH
  2005-11-07 19:04             ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Randy.Dunlap
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
  1 sibling, 2 replies; 131+ messages in thread
From: linas @ 2005-11-07 18:56 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> 
> 3) realy strong typing that sparse can detect.

Am compiling now.

> enums don't really work, as you can get away with using an integer and
> the compiler will never complain.  Please use a typedef (yeah, I said
> typedef) in the way that sparse will catch any bad users of the code.

How about typedef'ing  structs?

I'm not to clear on what "sparse" can do; however, in the good old days,
gcc allowed you to commit great sins when passing "struct blah *" to 
subroutines, whereas it stoped you cold if you tried the same trick 
with a typedef'ed "blah_t *".  This got me into the habit of turning
all structs into typedefs in my personal projects.  Can we expect
something similar for the kernel, and in particular, should we start
typedefing structs now?

(Documentation/CodingStyle doesn't mention typedef at all).

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 18:56           ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] linas
@ 2005-11-07 19:02             ` Greg KH
  2005-11-07 19:36               ` linas
  2005-11-07 19:04             ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Randy.Dunlap
  1 sibling, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-07 19:02 UTC (permalink / raw)
  To: linas
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 12:56:21PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 
> > 3) realy strong typing that sparse can detect.
> 
> Am compiling now.
> 
> > enums don't really work, as you can get away with using an integer and
> > the compiler will never complain.  Please use a typedef (yeah, I said
> > typedef) in the way that sparse will catch any bad users of the code.
> 
> How about typedef'ing  structs?

No.  Use __bitwise.  See the lkml archives for how to do this properly.

> I'm not to clear on what "sparse" can do; however, in the good old days,
> gcc allowed you to commit great sins when passing "struct blah *" to 
> subroutines, whereas it stoped you cold if you tried the same trick 
> with a typedef'ed "blah_t *".  This got me into the habit of turning
> all structs into typedefs in my personal projects.  Can we expect
> something similar for the kernel, and in particular, should we start
> typedefing structs now?

No, never typedef a struct.  That's just wrong.  gcc should warn you
just the same if you pass the wrong struct pointer (and all of your code
builds without warnings, right?)

> (Documentation/CodingStyle doesn't mention typedef at all).

If it does, it should say not to use it at all :)

Except for this case, it's special...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 18:56           ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] linas
  2005-11-07 19:02             ` Greg KH
@ 2005-11-07 19:04             ` Randy.Dunlap
  1 sibling, 0 replies; 131+ messages in thread
From: Randy.Dunlap @ 2005-11-07 19:04 UTC (permalink / raw)
  To: linas
  Cc: Greg KH, Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, 7 Nov 2005, linas wrote:

> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> >
> > 3) realy strong typing that sparse can detect.
>
> Am compiling now.
>
> > enums don't really work, as you can get away with using an integer and
> > the compiler will never complain.  Please use a typedef (yeah, I said
> > typedef) in the way that sparse will catch any bad users of the code.
>
> How about typedef'ing  structs?

No no no.  (I feel sure that you will get plenty of responses.)

> I'm not to clear on what "sparse" can do; however, in the good old days,
> gcc allowed you to commit great sins when passing "struct blah *" to
> subroutines, whereas it stoped you cold if you tried the same trick
> with a typedef'ed "blah_t *".  This got me into the habit of turning
> all structs into typedefs in my personal projects.  Can we expect
> something similar for the kernel, and in particular, should we start
> typedefing structs now?

No no no.

> (Documentation/CodingStyle doesn't mention typedef at all).

We can submit patches for that.

Basically (generally) we never want a struct to be typedef-ed.
(There may be a couple of exceptions to this.)

We do allow a very few basic types to be typedef-ed, as long as
the basic type (e.g., pid_t) is also a C language basic type or
the typedef is useful for strong type checking.

-- 
~Randy

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 19:02             ` Greg KH
@ 2005-11-07 19:36               ` linas
  2005-11-07 20:02                 ` Greg KH
  0 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-07 19:36 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> 
> > I'm not to clear on what "sparse" can do; however, in the good old days,
> > gcc allowed you to commit great sins when passing "struct blah *" to 
> > subroutines, whereas it stoped you cold if you tried the same trick 
> > with a typedef'ed "blah_t *".  This got me into the habit of turning
> > all structs into typedefs in my personal projects.  Can we expect
> > something similar for the kernel, and in particular, should we start
> > typedefing structs now?
> 
> No, never typedef a struct.  That's just wrong.  

Its a defacto convention for most C-language apps, see, for 
example Xlib, gtk and gnome.  Also, "grep typedef include/linux/*"
shows that many kernel device drivers use this convention.

> gcc should warn you
> just the same if you pass the wrong struct pointer 

There were many cases where it did not warn (I don't remember 
the case of subr calls). I beleive this had to do with ANSI-C spec
issues dating to the 1990's; traditional C is weakly typed.

Its not just gcc; anyoe who coded for a while eventually discovered
that tyedefs where strongly typed, but "struct blah *" were not.

> (and all of your code
> builds without warnings, right?)

:-/  Yes, of course.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 18:27         ` Greg KH
  2005-11-07 18:56           ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] linas
@ 2005-11-07 19:57           ` linas
  2005-11-07 19:59             ` Christoph Hellwig
                               ` (7 more replies)
  1 sibling, 8 replies; 131+ messages in thread
From: linas @ 2005-11-07 19:57 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> 3) realy strong typing that sparse can detect.


PCI Error Recovery: header file patch

Change enums and subroutine signatures to be strongly typed, per recent
discussion with GregKH. Also, change the acronym to the more unique, 
less generic "PERS" "PCI Error Recovery System".

Greg, Please apply.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-mm1/include/linux/pci.h
===================================================================
--- linux-2.6.14-mm1.orig/include/linux/pci.h	2005-11-07 13:55:28.528843983 -0600
+++ linux-2.6.14-mm1/include/linux/pci.h	2005-11-07 13:55:35.745830682 -0600
@@ -82,11 +82,11 @@
  *  the pci device.  If some PCI bus between here and the pci device
  *  has crashed or locked up, this info is reflected here.
  */
-enum pci_channel_state {
+typedef enum {
 	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
 	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
 	pci_channel_io_perm_failure,	/* PCI card is dead */
-};
+} pci_channel_state_t;
 
 /*
  * The pci_dev structure is used to describe PCI devices.
@@ -121,7 +121,7 @@
 					   this is D0-D3, D0 being fully functional,
 					   and D3 being off. */
 
-	enum pci_channel_state error_state;	/* current connectivity state */
+	pci_channel_state_t error_state;	/* current connectivity state */
 	struct	device	dev;		/* Generic device interface */
 
 	/* device is compatible with these IDs */
@@ -245,35 +245,35 @@
 };
 
 /* ---------------------------------------------------------------- */
-/** PCI error recovery infrastructure.  If a PCI device driver provides
+/** PCI Error Recovery System (PERS).  If a PCI device driver provides
  *  a set fof callbacks in struct pci_error_handlers, then that device driver
  *  will be notified of PCI bus errors, and will be driven to recovery
  *  when an error occurs.
  */
 
-enum pcierr_result {
-	PCIERR_RESULT_NONE = 0,		/* no result/none/not supported in device driver */
-	PCIERR_RESULT_CAN_RECOVER=1,	/* Device driver can recover without slot reset */
-	PCIERR_RESULT_NEED_RESET,	/* Device driver wants slot to be reset. */
-	PCIERR_RESULT_DISCONNECT,	/* Device has completely failed, is unrecoverable */
-	PCIERR_RESULT_RECOVERED,	/* Device driver is fully recovered and operational */
-};
+typedef enum {
+	PERS_RESULT_NONE = 0,		/* no result/none/not supported in device driver */
+	PERS_RESULT_CAN_RECOVER=1,	/* Device driver can recover without slot reset */
+	PERS_RESULT_NEED_RESET,	/* Device driver wants slot to be reset. */
+	PERS_RESULT_DISCONNECT,	/* Device has completely failed, is unrecoverable */
+	PERS_RESULT_RECOVERED,	/* Device driver is fully recovered and operational */
+} pers_result_t;
 
 /* PCI bus error event callbacks */
 struct pci_error_handlers
 {
 	/* PCI bus error detected on this device */
-	int (*error_detected)(struct pci_dev *dev,
-	                      enum pci_channel_state error);
+	pers_result_t (*error_detected)(struct pci_dev *dev,
+	                      pci_channel_state_t error);
 
 	/* MMIO has been re-enabled, but not DMA */
-	int (*mmio_enabled)(struct pci_dev *dev);
+	pers_result_t (*mmio_enabled)(struct pci_dev *dev);
 
 	/* PCI Express link has been reset */
-	int (*link_reset)(struct pci_dev *dev);
+	pers_result_t (*link_reset)(struct pci_dev *dev);
 
 	/* PCI slot has been reset */
-	int (*slot_reset)(struct pci_dev *dev);
+	pers_result_t (*slot_reset)(struct pci_dev *dev);
 
 	/* Device driver may resume normal operations */
 	void (*resume)(struct pci_dev *dev);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
@ 2005-11-07 19:59             ` Christoph Hellwig
  2005-11-07 20:03             ` Greg KH
                               ` (6 subsequent siblings)
  7 siblings, 0 replies; 131+ messages in thread
From: Christoph Hellwig @ 2005-11-07 19:59 UTC (permalink / raw)
  To: linas
  Cc: Greg KH, linux-kernel, bluesmoke-devel, Paul Mackerras,
	linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
> 
> 
> PCI Error Recovery: header file patch
> 
> Change enums and subroutine signatures to be strongly typed, per recent
> discussion with GregKH. Also, change the acronym to the more unique, 
> less generic "PERS" "PCI Error Recovery System".
> 
> Greg, Please apply.
> 
> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
> 
> --
> Index: linux-2.6.14-mm1/include/linux/pci.h
> ===================================================================
> --- linux-2.6.14-mm1.orig/include/linux/pci.h	2005-11-07 13:55:28.528843983 -0600
> +++ linux-2.6.14-mm1/include/linux/pci.h	2005-11-07 13:55:35.745830682 -0600
> @@ -82,11 +82,11 @@
>   *  the pci device.  If some PCI bus between here and the pci device
>   *  has crashed or locked up, this info is reflected here.
>   */
> -enum pci_channel_state {
> +typedef enum {
>  	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
>  	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
>  	pci_channel_io_perm_failure,	/* PCI card is dead */
> -};
> +} pci_channel_state_t;

this is not strongly typed, just a completely useless typedef.


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 19:36               ` linas
@ 2005-11-07 20:02                 ` Greg KH
  2005-11-07 20:41                   ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-07 20:02 UTC (permalink / raw)
  To: linas
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> > 
> > > I'm not to clear on what "sparse" can do; however, in the good old days,
> > > gcc allowed you to commit great sins when passing "struct blah *" to 
> > > subroutines, whereas it stoped you cold if you tried the same trick 
> > > with a typedef'ed "blah_t *".  This got me into the habit of turning
> > > all structs into typedefs in my personal projects.  Can we expect
> > > something similar for the kernel, and in particular, should we start
> > > typedefing structs now?
> > 
> > No, never typedef a struct.  That's just wrong.  
> 
> Its a defacto convention for most C-language apps, see, for 
> example Xlib, gtk and gnome.

The kernel is not those projects.

> Also, "grep typedef include/linux/*" shows that many kernel device
> drivers use this convention.

They are wrong and should be fixed.

See my old OLS paper on all about the problems of using typedefs in
kernel code.

> > gcc should warn you
> > just the same if you pass the wrong struct pointer 
> 
> There were many cases where it did not warn (I don't remember 
> the case of subr calls). I beleive this had to do with ANSI-C spec
> issues dating to the 1990's; traditional C is weakly typed.
> 
> Its not just gcc; anyoe who coded for a while eventually discovered
> that tyedefs where strongly typed, but "struct blah *" were not.

Sorry, but you are using a broken compiler if it doesn't complain about
this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
  2005-11-07 19:59             ` Christoph Hellwig
@ 2005-11-07 20:03             ` Greg KH
  2005-11-07 21:21               ` [PATCH 1/7]: PCI revised (2) " linas
  2005-11-07 21:30             ` [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver linas
                               ` (5 subsequent siblings)
  7 siblings, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-07 20:03 UTC (permalink / raw)
  To: linas
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.
> 
> 
> PCI Error Recovery: header file patch
> 
> Change enums and subroutine signatures to be strongly typed, per recent
> discussion with GregKH. Also, change the acronym to the more unique, 
> less generic "PERS" "PCI Error Recovery System".
> 
> Greg, Please apply.
> 
> Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
> 
> --
> Index: linux-2.6.14-mm1/include/linux/pci.h
> ===================================================================
> --- linux-2.6.14-mm1.orig/include/linux/pci.h	2005-11-07 13:55:28.528843983 -0600
> +++ linux-2.6.14-mm1/include/linux/pci.h	2005-11-07 13:55:35.745830682 -0600
> @@ -82,11 +82,11 @@
>   *  the pci device.  If some PCI bus between here and the pci device
>   *  has crashed or locked up, this info is reflected here.
>   */
> -enum pci_channel_state {
> +typedef enum {
>  	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
>  	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
>  	pci_channel_io_perm_failure,	/* PCI card is dead */
> -};
> +} pci_channel_state_t;

No, this doesn't help out at all.  Please go look at the __bitwise
documentation.

Good luck,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 20:02                 ` Greg KH
@ 2005-11-07 20:41                   ` linas
  2005-11-07 20:46                     ` Greg KH
  2005-11-08  1:11                     ` Steven Rostedt
  0 siblings, 2 replies; 131+ messages in thread
From: linas @ 2005-11-07 20:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 12:02:57PM -0800, Greg KH was heard to remark:
> On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote:
> > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> > > 
> > > No, never typedef a struct.  That's just wrong.  
> > 
> > Its a defacto convention for most C-language apps, see, for 
> > example Xlib, gtk and gnome.
> 
> The kernel is not those projects.

!!

> > Also, "grep typedef include/linux/*" shows that many kernel device
> > drivers use this convention.
> 
> They are wrong and should be fixed.

What, precisely, is wrong?

> See my old OLS paper on all about the problems of using typedefs in
> kernel code.

Is this on the web somewhere? Google is having trouble finding it.

I understand that old code bases often choke on typedefs;
forward declarations are a big problem. Not to be rude, 
but choking for forward decl's is often a symptom of 
poorly-designed code.

> > > gcc should warn you
> > > just the same if you pass the wrong struct pointer 
> > 
> > There were many cases where it did not warn (I don't remember 
> > the case of subr calls). I beleive this had to do with ANSI-C spec
> > issues dating to the 1990's; traditional C is weakly typed.
> > 
> > Its not just gcc; anyoe who coded for a while eventually discovered
> > that tyedefs where strongly typed, but "struct blah *" were not.
> 
> Sorry, but you are using a broken compiler if it doesn't complain about
> this.

Uhh, gcc? 

Maybe I've just got more mileage under my wheels. Of all of the 
compilers I've used, gcc has always had the strictest checking, 
and was the most verbose about warnings.  There's a trick that pros
use when they inherit crufty old code: run it through gcc first, and
clean it up, even if the project requires using some other compiler.

I was simply stating a fact about gcc and about standard ANSI-C 
type-checking that is "well known" to anyone who's been around the 
block. I was not trying to start an argument.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 20:41                   ` linas
@ 2005-11-07 20:46                     ` Greg KH
  2005-11-08  1:11                     ` Steven Rostedt
  1 sibling, 0 replies; 131+ messages in thread
From: Greg KH @ 2005-11-07 20:46 UTC (permalink / raw)
  To: linas
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 02:41:36PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 12:02:57PM -0800, Greg KH was heard to remark:
> > On Mon, Nov 07, 2005 at 01:36:00PM -0600, linas wrote:
> > > On Mon, Nov 07, 2005 at 11:02:45AM -0800, Greg KH was heard to remark:
> > > > 
> > > > No, never typedef a struct.  That's just wrong.  
> > > 
> > > Its a defacto convention for most C-language apps, see, for 
> > > example Xlib, gtk and gnome.
> > 
> > The kernel is not those projects.
> 
> !!

Yeah, anyone who thinks that Xlib is the paradigm for coding style...

> > > Also, "grep typedef include/linux/*" shows that many kernel device
> > > drivers use this convention.
> > 
> > They are wrong and should be fixed.
> 
> What, precisely, is wrong?
> 
> > See my old OLS paper on all about the problems of using typedefs in
> > kernel code.
> 
> Is this on the web somewhere? Google is having trouble finding it.

http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_paper/codingstyle.ps
and the presentation is at:
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/

> > > > gcc should warn you
> > > > just the same if you pass the wrong struct pointer 
> > > 
> > > There were many cases where it did not warn (I don't remember 
> > > the case of subr calls). I beleive this had to do with ANSI-C spec
> > > issues dating to the 1990's; traditional C is weakly typed.
> > > 
> > > Its not just gcc; anyoe who coded for a while eventually discovered
> > > that tyedefs where strongly typed, but "struct blah *" were not.
> > 
> > Sorry, but you are using a broken compiler if it doesn't complain about
> > this.
> 
> Uhh, gcc? 

Try it in the kernel today.  You will get a warning if you pass in a
pointer to a different structure type than it was defined as.

> I was simply stating a fact about gcc and about standard ANSI-C 
> type-checking that is "well known" to anyone who's been around the 
> block. I was not trying to start an argument.

Then let's end it here...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 20:03             ` Greg KH
@ 2005-11-07 21:21               ` linas
  2005-11-07 21:37                 ` Greg KH
  0 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-07 21:21 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 12:03:52PM -0800, Greg KH was heard to remark:
> On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas wrote:
> > On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > > 3) realy strong typing that sparse can detect.
> Please go look at the __bitwise documentation.

PCI Error Recovery: header file patch

Change enums and subroutine signatures to be strongly typed, per recent
discussion with GregKH. Also, change the acronym to the more unique, 
less generic "PERS" "PCI Error Recovery System".

Greg, Please apply.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-mm1/include/linux/pci.h
===================================================================
--- linux-2.6.14-mm1.orig/include/linux/pci.h	2005-11-07 13:55:28.528843983 -0600
+++ linux-2.6.14-mm1/include/linux/pci.h	2005-11-07 14:56:04.917367579 -0600
@@ -82,10 +82,12 @@
  *  the pci device.  If some PCI bus between here and the pci device
  *  has crashed or locked up, this info is reflected here.
  */
+typedef int __bitwise pci_channel_state_t;
+
 enum pci_channel_state {
-	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
-	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
-	pci_channel_io_perm_failure,	/* PCI card is dead */
+	pci_channel_io_normal = (__force pci_channel_state_t) 0,	/* I/O channel is in normal state */
+	pci_channel_io_frozen = (__force pci_channel_state_t) 1,	/* I/O to channel is blocked */
+	pci_channel_io_perm_failure = (__force pci_channel_state_t) 2,	/* PCI card is dead */
 };
 
 /*
@@ -121,7 +123,7 @@
 					   this is D0-D3, D0 being fully functional,
 					   and D3 being off. */
 
-	enum pci_channel_state error_state;	/* current connectivity state */
+	pci_channel_state_t error_state;	/* current connectivity state */
 	struct	device	dev;		/* Generic device interface */
 
 	/* device is compatible with these IDs */
@@ -245,35 +247,37 @@
 };
 
 /* ---------------------------------------------------------------- */
-/** PCI error recovery infrastructure.  If a PCI device driver provides
+/** PCI Error Recovery System (PERS).  If a PCI device driver provides
  *  a set fof callbacks in struct pci_error_handlers, then that device driver
  *  will be notified of PCI bus errors, and will be driven to recovery
  *  when an error occurs.
  */
 
-enum pcierr_result {
-	PCIERR_RESULT_NONE = 0,		/* no result/none/not supported in device driver */
-	PCIERR_RESULT_CAN_RECOVER=1,	/* Device driver can recover without slot reset */
-	PCIERR_RESULT_NEED_RESET,	/* Device driver wants slot to be reset. */
-	PCIERR_RESULT_DISCONNECT,	/* Device has completely failed, is unrecoverable */
-	PCIERR_RESULT_RECOVERED,	/* Device driver is fully recovered and operational */
+typedef int __bitwise pers_result_t;
+
+enum pers_result {
+	PERS_RESULT_NONE = (__force pers_result_t) 0,		/* no result/none/not supported in device driver */
+	PERS_RESULT_CAN_RECOVER = (__force pers_result_t) 1,	/* Device driver can recover without slot reset */
+	PERS_RESULT_NEED_RESET = (__force pers_result_t) 2,	/* Device driver wants slot to be reset. */
+	PERS_RESULT_DISCONNECT = (__force pers_result_t) 3,	/* Device has completely failed, is unrecoverable */
+	PERS_RESULT_RECOVERED = (__force pers_result_t) 4,	/* Device driver is fully recovered and operational */
 };
 
 /* PCI bus error event callbacks */
 struct pci_error_handlers
 {
 	/* PCI bus error detected on this device */
-	int (*error_detected)(struct pci_dev *dev,
-	                      enum pci_channel_state error);
+	pers_result_t (*error_detected)(struct pci_dev *dev,
+	                      pci_channel_state_t error);
 
 	/* MMIO has been re-enabled, but not DMA */
-	int (*mmio_enabled)(struct pci_dev *dev);
+	pers_result_t (*mmio_enabled)(struct pci_dev *dev);
 
 	/* PCI Express link has been reset */
-	int (*link_reset)(struct pci_dev *dev);
+	pers_result_t (*link_reset)(struct pci_dev *dev);
 
 	/* PCI slot has been reset */
-	int (*slot_reset)(struct pci_dev *dev);
+	pers_result_t (*slot_reset)(struct pci_dev *dev);
 
 	/* Device driver may resume normal operations */
 	void (*resume)(struct pci_dev *dev);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
  2005-11-07 19:59             ` Christoph Hellwig
  2005-11-07 20:03             ` Greg KH
@ 2005-11-07 21:30             ` linas
  2005-11-07 21:40               ` Brian King
  2005-11-07 21:31             ` [PATCH 3/7]: Revised [PATCH 28/42]: SCSI: add PCI error recovery to Symbios " linas
                               ` (4 subsequent siblings)
  7 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-07 21:30 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, bluesmoke-devel, Paul Mackerras, linuxppc64-dev,
	linux-pci, Brian King

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the IPR SCSI device driver.
The patch has been tested, and appears to work well.

Please apply.

Signed-off-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Brian King <brking@us.ibm.com>

--
Index: linux-2.6.14-mm1/drivers/scsi/ipr.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/ipr.c	2005-11-07 13:55:27.986920072 -0600
+++ linux-2.6.14-mm1/drivers/scsi/ipr.c	2005-11-07 15:02:00.639392946 -0600
@@ -5328,6 +5328,94 @@
 				shutdown_type);
 }
 
+/* --------------- PCI Error Recovery infrastructure ----------- */
+/** If the PCI slot is frozen, hold off all i/o
+ *  activity; then, as soon as the slot is available again,
+ *  initiate an adapter reset.
+ */
+static int ipr_reset_freeze(struct ipr_cmnd *ipr_cmd)
+{
+	/* Disallow new interrupts, avoid loop */
+	ipr_cmd->ioa_cfg->allow_interrupts = 0;
+	list_add_tail(&ipr_cmd->queue, &ipr_cmd->ioa_cfg->pending_q);
+	ipr_cmd->done = ipr_reset_ioa_job;
+	return IPR_RC_JOB_RETURN;
+}
+
+/** ipr_eeh_frozen -- called when slot has experience PCI bus error.
+ *  This routine is called to tell us that the PCI bus is down.
+ *  Can't do anything here, except put the device driver into a
+ *  holding pattern, waiting for the PCI bus to come back.
+ */
+static void ipr_eeh_frozen (struct pci_dev *pdev)
+{
+	unsigned long flags = 0;
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+	_ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_freeze, IPR_SHUTDOWN_NONE);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+}
+
+/** ipr_eeh_slot_reset - called when pci slot has been reset.
+ *
+ * This routine is called by the pci error recovery recovery
+ * code after the PCI slot has been reset, just before we
+ * should resume normal operations.
+ */
+static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev)
+{
+	unsigned long flags = 0;
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+	// pci_enable_device(pdev);
+	// pci_set_master(pdev);
+	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+	_ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space,
+	                                 IPR_SHUTDOWN_NONE);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+
+	return PERS_RESULT_RECOVERED;
+}
+
+/** This routine is called when the PCI bus has permanently
+ *  failed.  This routine should purge all pending I/O and
+ *  shut down the device driver (close and unload).
+ */
+static void ipr_eeh_perm_failure(struct pci_dev *pdev)
+{
+	unsigned long flags = 0;
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
+
+	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
+	if (ioa_cfg->sdt_state == WAIT_FOR_DUMP)
+		ioa_cfg->sdt_state = ABORT_DUMP;
+	ioa_cfg->reset_retries = IPR_NUM_RESET_RELOAD_RETRIES;
+	ioa_cfg->in_ioa_bringdown = 1;
+	ipr_initiate_ioa_reset(ioa_cfg, IPR_SHUTDOWN_NONE);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
+}
+
+static pers_result_t ipr_eeh_error_detected(struct pci_dev *pdev,
+                                pci_channel_state_t state)
+{
+	switch (state) {
+		case pci_channel_io_frozen:
+			ipr_eeh_frozen (pdev);
+			return PERS_RESULT_NEED_RESET;
+
+		case pci_channel_io_perm_failure:
+			ipr_eeh_perm_failure (pdev);
+			return PERS_RESULT_DISCONNECT;
+			break;
+		default:
+			break;
+	}
+	return PERS_RESULT_NEED_RESET;
+}
+
+/* ------------- end of PCI Error Recovery suport ----------- */
+
 /**
  * ipr_probe_ioa_part2 - Initializes IOAs found in ipr_probe_ioa(..)
  * @ioa_cfg:	ioa cfg struct
@@ -6065,12 +6153,18 @@
 };
 MODULE_DEVICE_TABLE(pci, ipr_pci_table);
 
+static struct pci_error_handlers ipr_err_handler = {
+	.error_detected = ipr_eeh_error_detected,
+	.slot_reset = ipr_eeh_slot_reset,
+};
+
 static struct pci_driver ipr_driver = {
 	.name = IPR_NAME,
 	.id_table = ipr_pci_table,
 	.probe = ipr_probe,
 	.remove = ipr_remove,
 	.shutdown = ipr_shutdown,
+	.err_handler = &ipr_err_handler,
 };
 
 /**

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 3/7]: Revised [PATCH 28/42]: SCSI: add PCI error recovery to Symbios dev driver
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
                               ` (2 preceding siblings ...)
  2005-11-07 21:30             ` [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver linas
@ 2005-11-07 21:31             ` linas
  2005-11-07 21:34             ` [PATCH 4/7]: Revised [PATCH 29/42]: ethernet: add PCI error recovery to e100 " linas
                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-07 21:31 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, bluesmoke-devel, Paul Mackerras, linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.


Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the Symbios SCSI device driver.
The patch has been tested, and appears to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-07 13:55:26.839081234 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-07 15:02:08.152337375 -0600
@@ -686,6 +686,10 @@
 
 	if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("[");
 
+	/* Avoid spinloop trying to handle interrupts on frozen device */
+	if (np->s.io_state != pci_channel_io_normal)
+		return IRQ_HANDLED;
+
 	spin_lock_irqsave(np->s.host->host_lock, flags);
 	sym_interrupt(np);
 	spin_unlock_irqrestore(np->s.host->host_lock, flags);
@@ -759,6 +763,25 @@
  */
 static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); }
 
+static void sym_eeh_timeout(u_long p)
+{
+	struct sym_eh_wait *ep = (struct sym_eh_wait *) p;
+	if (!ep)
+		return;
+	complete(&ep->done);
+}
+
+static void sym_eeh_done(struct sym_eh_wait *ep)
+{
+	if (!ep)
+		return;
+	ep->timed_out = 0;
+	if (!del_timer(&ep->timer))
+		return;
+
+	complete(&ep->done);
+}
+
 /*
  *  Generic method for our eh processing.
  *  The 'op' argument tells what we have to do.
@@ -799,6 +822,35 @@
 
 	/* Try to proceed the operation we have been asked for */
 	sts = -1;
+
+	/* We may be in an error condition because the PCI bus
+	 * went down. In this case, we need to wait until the
+	 * PCI bus is reset, the card is reset, and only then
+	 * proceed with the scsi error recovery.  We'll wait
+	 * for 15 seconds for this to happen.
+	 */
+#define WAIT_FOR_PCI_RECOVERY	15
+	if (np->s.io_state != pci_channel_io_normal) {
+		struct sym_eh_wait eeh, *eep = &eeh;
+		np->s.io_reset_wait = eep;
+		init_completion(&eep->done);
+		init_timer(&eep->timer);
+		eep->to_do = SYM_EH_DO_WAIT;
+		eep->timer.expires = jiffies + (WAIT_FOR_PCI_RECOVERY*HZ);
+		eep->timer.function = sym_eeh_timeout;
+		eep->timer.data = (u_long)eep;
+		eep->timed_out = 1;	/* Be pessimistic for once :) */
+		add_timer(&eep->timer);
+		spin_unlock_irq(np->s.host->host_lock);
+		wait_for_completion(&eep->done);
+		spin_lock_irq(np->s.host->host_lock);
+		if (eep->timed_out) {
+			printk (KERN_ERR "%s: Timed out waiting for PCI reset\n",
+			       sym_name(np));
+		}
+		np->s.io_reset_wait = NULL;
+	}
+
 	switch(op) {
 	case SYM_EH_ABORT:
 		sts = sym_abort_scsiio(np, cmd, 1);
@@ -1584,6 +1636,8 @@
 	np->maxoffs	= dev->chip.offset_max;
 	np->maxburst	= dev->chip.burst_max;
 	np->myaddr	= dev->host_id;
+	np->s.io_state = pci_channel_io_normal;
+	np->s.io_reset_wait = NULL;
 
 	/*
 	 *  Edit its name.
@@ -1916,6 +1970,58 @@
 	return 1;
 }
 
+/* ------------- PCI Error Recovery infrastructure -------------- */
+/** sym2_io_error_detected() is called when PCI error is detected */
+static pers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct sym_hcb *np = pci_get_drvdata(pdev);
+
+	np->s.io_state = state;
+	// XXX If slot is permanently frozen, then what?
+	// Should we scsi_remove_host() maybe ??
+
+	/* Request a slot slot reset. */
+	return PERS_RESULT_NEED_RESET;
+}
+
+/** sym2_io_slot_reset is called when the pci bus has been reset.
+ *  Restart the card from scratch. */
+static pers_result_t sym2_io_slot_reset (struct pci_dev *pdev)
+{
+	struct sym_hcb *np = pci_get_drvdata(pdev);
+
+	printk (KERN_INFO "%s: recovering from a PCI slot reset\n",
+	    sym_name(np));
+
+	if (pci_enable_device(pdev))
+		printk (KERN_ERR "%s: device setup failed most egregiously\n",
+			    sym_name(np));
+
+	pci_set_master(pdev);
+	enable_irq (pdev->irq);
+
+	/* Perform host reset only on one instance of the card */
+	if (0 == PCI_FUNC (pdev->devfn))
+		sym_reset_scsi_bus(np, 0);
+
+	return PERS_RESULT_RECOVERED;
+}
+
+/** sym2_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ */
+static void sym2_io_resume (struct pci_dev *pdev)
+{
+	struct sym_hcb *np = pci_get_drvdata(pdev);
+
+	/* Perform device startup only once for this card. */
+	if (0 == PCI_FUNC (pdev->devfn))
+		sym_start_up (np, 1);
+
+	np->s.io_state = pci_channel_io_normal;
+	sym_eeh_done (np->s.io_reset_wait);
+}
+
 /*
  * Driver host template.
  */
@@ -2169,11 +2275,18 @@
 
 MODULE_DEVICE_TABLE(pci, sym2_id_table);
 
+static struct pci_error_handlers sym2_err_handler = {
+	.error_detected = sym2_io_error_detected,
+	.slot_reset = sym2_io_slot_reset,
+	.resume = sym2_io_resume,
+};
+
 static struct pci_driver sym2_driver = {
 	.name		= NAME53C8XX,
 	.id_table	= sym2_id_table,
 	.probe		= sym2_probe,
 	.remove		= __devexit_p(sym2_remove),
+	.err_handler = &sym2_err_handler,
 };
 
 static int __init sym2_init(void)
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.h
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.h	2005-11-07 13:55:26.839081234 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.h	2005-11-07 15:02:08.154337094 -0600
@@ -181,6 +181,10 @@
 	char		chip_name[8];
 	struct pci_dev	*device;
 
+	/* pci bus i/o state; waiter for clearing of i/o state */
+	pci_channel_state_t io_state;
+	struct sym_eh_wait *io_reset_wait;
+
 	struct Scsi_Host *host;
 
 	void __iomem *	ioaddr;		/* MMIO kernel io address	*/
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_hipd.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c	2005-11-07 13:55:26.840081093 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_hipd.c	2005-11-07 15:02:08.162335970 -0600
@@ -2810,6 +2810,7 @@
 	u_char	istat, istatc;
 	u_char	dstat;
 	u_short	sist;
+	u_int    icnt;
 
 	/*
 	 *  interrupt on the fly ?
@@ -2851,6 +2852,7 @@
 	sist	= 0;
 	dstat	= 0;
 	istatc	= istat;
+	icnt = 0;
 	do {
 		if (istatc & SIP)
 			sist  |= INW(np, nc_sist);
@@ -2858,6 +2860,19 @@
 			dstat |= INB(np, nc_dstat);
 		istatc = INB(np, nc_istat);
 		istat |= istatc;
+		
+		/* Prevent deadlock waiting on a condition that may never clear. */
+		/* XXX this is a temporary kludge; the correct to detect
+		 * a PCI bus error would be to use the io_check interfaces
+		 * proposed by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
+		 * Problem with polling like that is the state flag might not
+		 * be set.
+		 */
+		icnt ++;
+		if (100 < icnt) {
+			if (np->s.device->error_state != pci_channel_io_normal)
+				return;
+		}
 	} while (istatc & (SIP|DIP));
 
 	if (DEBUG_FLAGS & DEBUG_TINY)

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 4/7]: Revised [PATCH 29/42]: ethernet: add PCI error recovery to e100 dev driver
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
                               ` (3 preceding siblings ...)
  2005-11-07 21:31             ` [PATCH 3/7]: Revised [PATCH 28/42]: SCSI: add PCI error recovery to Symbios " linas
@ 2005-11-07 21:34             ` linas
  2005-11-07 21:36             ` [PATCH: 5/7]: Revised: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 " linas
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-07 21:34 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, bluesmoke-devel, Paul Mackerras, linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ethernet e100
device driver. The patch has been tested, and appears to work well.

Please apply.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-mm1/drivers/net/e100.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e100.c	2005-11-07 13:55:26.363148057 -0600
+++ linux-2.6.14-mm1/drivers/net/e100.c	2005-11-07 15:02:11.120920287 -0600
@@ -2465,6 +2465,75 @@
 }
 
 
+/* ------------------ PCI Error Recovery infrastructure  -------------- */
+/** e100_io_error_detected() is called when PCI error is detected */
+static pers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+
+	/* Same as calling e100_down(netdev_priv(netdev)), but generic */
+	netdev->stop(netdev);
+
+	/* Is a detach needed ?? */
+	// netif_device_detach(netdev);
+
+	/* Request a slot reset. */
+	return PERS_RESULT_NEED_RESET;
+}
+
+/** e100_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch. */
+static pers_result_t e100_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	if(pci_enable_device(pdev)) {
+		printk(KERN_ERR "e100: Cannot re-enable PCI device after reset.\n");
+		return PERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	/* Only one device per card can do a reset */
+	if (0 != PCI_FUNC (pdev->devfn))
+		return PERS_RESULT_RECOVERED;
+
+	e100_hw_reset(nic);
+	e100_phy_init(nic);
+
+	if(e100_hw_init(nic)) {
+		DPRINTK(HW, ERR, "e100_hw_init failed\n");
+		return PERS_RESULT_DISCONNECT;
+	}
+
+	return PERS_RESULT_RECOVERED;
+}
+
+/** e100_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ */
+static void e100_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nic *nic = netdev_priv(netdev);
+
+	/* ack any pending wake events, disable PME */
+	pci_enable_wake(pdev, 0, 0);
+
+	netif_device_attach(netdev);
+	if(netif_running(netdev)) {
+		e100_open (netdev);
+		mod_timer(&nic->watchdog, jiffies);
+	}
+}
+
+static struct pci_error_handlers e100_err_handler = {
+	.error_detected = e100_io_error_detected,
+	.slot_reset = e100_io_slot_reset,
+	.resume = e100_io_resume,
+};
+
+
 static struct pci_driver e100_driver = {
 	.name =         DRV_NAME,
 	.id_table =     e100_id_table,
@@ -2475,6 +2544,7 @@
 	.resume =       e100_resume,
 #endif
 	.shutdown =	e100_shutdown,
+	.err_handler = &e100_err_handler,
 };
 
 static int __init e100_init_module(void)

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH: 5/7]: Revised: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 dev driver
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
                               ` (4 preceding siblings ...)
  2005-11-07 21:34             ` [PATCH 4/7]: Revised [PATCH 29/42]: ethernet: add PCI error recovery to e100 " linas
@ 2005-11-07 21:36             ` linas
  2005-11-07 21:37             ` [PATCH 6/7]: Revised [PATCH 31/42]: ethernet: add PCI error recovery to ixgb " linas
  2005-11-07 21:39             ` [PATCH 7/7]: Revised [PATCH 32/42]: RFC: Add compile-time config options linas
  7 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-07 21:36 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, bluesmoke-devel, Paul Mackerras, linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel gigabit
ethernet e1000 device driver. The patch has been tested, and appears
to work well.

Please apply.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e1000/e1000_main.c	2005-11-07 13:55:25.948206317 -0600
+++ linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c	2005-11-07 15:02:12.811682734 -0600
@@ -206,6 +206,16 @@
 void e1000_rx_schedule(void *data);
 #endif
 
+static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state);
+static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
+static void e1000_io_resume(struct pci_dev *pdev);
+
+static struct pci_error_handlers e1000_err_handler = {
+	.error_detected = e1000_io_error_detected,
+	.slot_reset = e1000_io_slot_reset,
+	.resume = e1000_io_resume,
+};
+
 /* Exported from other modules */
 
 extern void e1000_check_options(struct e1000_adapter *adapter);
@@ -218,8 +228,9 @@
 	/* Power Managment Hooks */
 #ifdef CONFIG_PM
 	.suspend  = e1000_suspend,
-	.resume   = e1000_resume
+	.resume   = e1000_resume,
 #endif
+	.err_handler = &e1000_err_handler,
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2938,6 +2949,10 @@
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
+	/* Prevent stats update while adapter is being reset */
+	if (adapter->link_speed == 0)
+		return;
+
 	spin_lock_irqsave(&adapter->stats_lock, flags);
 
 	/* these counters are modified from e1000_adjust_tbi_stats,
@@ -4359,4 +4374,88 @@
 }
 #endif
 
+/* --------------- PCI Error Recovery infrastructure ------------ */
+/** e1000_io_error_detected() is called when PCI error is detected */
+static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	if (netif_running(netdev))
+		e1000_down(adapter);
+
+	/* Request a slot slot reset. */
+	return PERS_RESULT_NEED_RESET;
+}
+
+/** e1000_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch.
+ *  Implementation resembles the first-half of the
+ *  e1000_resume routine.
+ */
+static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+
+	if (pci_enable_device(pdev)) {
+		printk(KERN_ERR "e1000: Cannot re-enable PCI device after reset.\n");
+		return PERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	pci_enable_wake(pdev, 3, 0);
+	pci_enable_wake(pdev, 4, 0); /* 4 == D3 cold */
+
+	/* Perform card reset only on one instance of the card */
+	if(0 != PCI_FUNC (pdev->devfn))
+		return PERS_RESULT_RECOVERED;
+
+	e1000_reset(adapter);
+	E1000_WRITE_REG(&adapter->hw, WUS, ~0);
+
+	return PERS_RESULT_RECOVERED;
+}
+
+/** e1000_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ *  Implementation resembles the second-half of the
+ *  e1000_resume routine.
+ */
+static void e1000_io_resume(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct e1000_adapter *adapter = netdev->priv;
+	uint32_t manc, swsm;
+
+	if(netif_running(netdev)) {
+		if (e1000_up(adapter)) {
+			printk("e1000: can't bring device back up after reset\n");
+			return;
+		}
+	}
+
+	netif_device_attach(netdev);
+
+	if(adapter->hw.mac_type >= e1000_82540 &&
+	    adapter->hw.media_type == e1000_media_type_copper) {
+		manc = E1000_READ_REG(&adapter->hw, MANC);
+		manc &= ~(E1000_MANC_ARP_EN);
+		E1000_WRITE_REG(&adapter->hw, MANC, manc);
+	}
+
+	switch(adapter->hw.mac_type) {
+	case e1000_82573:
+		swsm = E1000_READ_REG(&adapter->hw, SWSM);
+		E1000_WRITE_REG(&adapter->hw, SWSM,
+				swsm | E1000_SWSM_DRV_LOAD);
+		break;
+	default:
+		break;
+	}
+
+	if(netif_running(netdev))
+		mod_timer(&adapter->watchdog_timer, jiffies);
+}
+
 /* e1000_main.c */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 6/7]: Revised [PATCH 31/42]: ethernet: add PCI error recovery to ixgb dev driver
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
                               ` (5 preceding siblings ...)
  2005-11-07 21:36             ` [PATCH: 5/7]: Revised: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 " linas
@ 2005-11-07 21:37             ` linas
  2005-11-07 21:39             ` [PATCH 7/7]: Revised [PATCH 32/42]: RFC: Add compile-time config options linas
  7 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-07 21:37 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, bluesmoke-devel, Paul Mackerras, linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.

Replace-Subject: PCI Error Recovery: ixgb network device driver

Various PCI bus errors can be signaled by newer PCI controllers.  This
patch adds the PCI error recovery callbacks to the intel ten-gigabit
ethernet ixgb device driver. The patch has been tested, and appears
to work well.

Signed-off-by: Linas Vepstas <linas@linas.org>

--
Index: linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/ixgb/ixgb_main.c	2005-11-07 13:55:25.431278896 -0600
+++ linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c	2005-11-07 15:02:14.779406268 -0600
@@ -132,6 +132,16 @@
 static void ixgb_netpoll(struct net_device *dev);
 #endif
 
+static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state);
+static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev);
+static void ixgb_io_resume (struct pci_dev *pdev);
+
+static struct pci_error_handlers ixgb_err_handler = {
+	.error_detected = ixgb_io_error_detected,
+	.slot_reset = ixgb_io_slot_reset,
+	.resume = ixgb_io_resume,
+};
+
 /* Exported from other modules */
 
 extern void ixgb_check_options(struct ixgb_adapter *adapter);
@@ -141,6 +151,8 @@
 	.id_table = ixgb_pci_tbl,
 	.probe    = ixgb_probe,
 	.remove   = __devexit_p(ixgb_remove),
+	.err_handler = &ixgb_err_handler,
+
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -1654,8 +1666,16 @@
 	unsigned int i;
 #endif
 
+#ifdef XXX_CONFIG_IXGB_EEH_RECOVERY
+	if(unlikely(icr==EEH_IO_ERROR_VALUE(4))) {
+		if (eeh_slot_is_isolated (adapter->pdev))
+		// disable_irq_nosync (adapter->pdev->irq);
+		return IRQ_NONE;      /* Not our interrupt */
+	}
+#else
 	if(unlikely(!icr))
 		return IRQ_NONE;  /* Not our interrupt */
+#endif /* CONFIG_IXGB_EEH_RECOVERY */
 
 	if(unlikely(icr & (IXGB_INT_RXSEQ | IXGB_INT_LSC))) {
 		mod_timer(&adapter->watchdog_timer, jiffies);
@@ -2125,4 +2145,70 @@
 }
 #endif
 
+/* -------------- PCI Error Recovery infrastructure ---------------- */
+/** ixgb_io_error_detected() is called when PCI error is detected */
+static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct ixgb_adapter *adapter = netdev->priv;
+
+	if(netif_running(netdev))
+		ixgb_down(adapter, TRUE);
+
+	/* Request a slot reset. */
+	return PERS_RESULT_NEED_RESET;
+}
+
+/** ixgb_io_slot_reset is called after the pci bus has been reset.
+ *  Restart the card from scratch.
+ *  Implementation resembles the first-half of the
+ *  ixgb_resume routine.
+ */
+static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct ixgb_adapter *adapter = netdev->priv;
+
+	if(pci_enable_device(pdev)) {
+		printk(KERN_ERR "ixgb: Cannot re-enable PCI device after reset.\n");
+		return PERS_RESULT_DISCONNECT;
+	}
+	pci_set_master(pdev);
+
+	/* Perform card reset only on one instance of the card */
+	if (0 != PCI_FUNC (pdev->devfn))
+		return PERS_RESULT_RECOVERED;
+
+	ixgb_reset(adapter);
+
+	return PERS_RESULT_RECOVERED;
+}
+
+/** ixgb_io_resume is called when the error recovery driver
+ *  tells us that its OK to resume normal operation.
+ *  Implementation resembles the second-half of the
+ *  ixgb_resume routine.
+ */
+static void ixgb_io_resume (struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct ixgb_adapter *adapter = netdev->priv;
+
+	if(netif_running(netdev)) {
+		if(ixgb_up(adapter)) {
+			printk ("ixgb: can't bring device back up after reset\n");
+			return;
+		}
+	}
+
+	netif_device_attach(netdev);
+	if(netif_running(netdev))
+		mod_timer(&adapter->watchdog_timer, jiffies);
+
+	/* Reading all-ff's from the adapter will completely hose
+	 * the counts and statistics. So just clear them out */
+	memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats));
+	ixgb_update_stats(adapter);
+}
+
 /* ixgb_main.c */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 21:21               ` [PATCH 1/7]: PCI revised (2) " linas
@ 2005-11-07 21:37                 ` Greg KH
  2005-11-07 21:54                   ` Linus Torvalds
  2005-11-07 22:43                   ` [PATCH 1/7]: PCI revised (3) " linas
  0 siblings, 2 replies; 131+ messages in thread
From: Greg KH @ 2005-11-07 21:37 UTC (permalink / raw)
  To: linas, linux-sparse
  Cc: Paul Mackerras, linuxppc64-dev, johnrose, linux-pci,
	bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote:
> +typedef int __bitwise pci_channel_state_t;

Closer but...

>  enum pci_channel_state {
> -	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
> -	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
> -	pci_channel_io_perm_failure,	/* PCI card is dead */
> +	pci_channel_io_normal = (__force pci_channel_state_t) 0,	/* I/O channel is in normal state */
> +	pci_channel_io_frozen = (__force pci_channel_state_t) 1,	/* I/O to channel is blocked */
> +	pci_channel_io_perm_failure = (__force pci_channel_state_t) 2,	/* PCI card is dead */
>  };

You don't have to use an enum anymore, just use a #define.

Sparse developers, I see code in the kernel that that does both 
(__force foo_t) and (foo_t __force).  Which one is correct?


> +typedef int __bitwise pers_result_t;

Ugh, I don't like that name, but I can't think of anything better right
now.  You should at least keep "pci" at the beginning to make it make
more sense to people looking at it for the first time.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 7/7]: Revised [PATCH 32/42]: RFC: Add compile-time config options
  2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
                               ` (6 preceding siblings ...)
  2005-11-07 21:37             ` [PATCH 6/7]: Revised [PATCH 31/42]: ethernet: add PCI error recovery to ixgb " linas
@ 2005-11-07 21:39             ` linas
  7 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-07 21:39 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, bluesmoke-devel, Paul Mackerras, linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 01:57:27PM -0600, linas was heard to remark:
> On Mon, Nov 07, 2005 at 10:27:27AM -0800, Greg KH was heard to remark:
> > 3) realy strong typing that sparse can detect.

This OPTIONAL/RFC patch adds ifdef's around the PCI error recovery code in the 
various device drivers. This patch is "optional" in that its a little bit 
messy, but it does solve a little problem.

-- The good news: this gives some users (e.g. embeddd systems) the option 
	of not compiling in this code, thus making thier device drivers a tiny 
	bit smaller.

-- The bad news: This also clutters up the drivers with extraneous markup 
   and the config process with yet another config.

Please apply if you agree with the need for this patch :)

Signed-off-by: Linas Vepstas <linas@linas.org>

Index: linux-2.6.14-mm1/drivers/scsi/ipr.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/ipr.c	2005-11-07 15:02:00.639392946 -0600
+++ linux-2.6.14-mm1/drivers/scsi/ipr.c	2005-11-07 15:02:20.029668601 -0600
@@ -5329,6 +5329,8 @@
 }
 
 /* --------------- PCI Error Recovery infrastructure ----------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
+
 /** If the PCI slot is frozen, hold off all i/o
  *  activity; then, as soon as the slot is available again,
  *  initiate an adapter reset.
@@ -5414,6 +5416,7 @@
 	return PERS_RESULT_NEED_RESET;
 }
 
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 /* ------------- end of PCI Error Recovery suport ----------- */
 
 /**
@@ -6153,10 +6156,12 @@
 };
 MODULE_DEVICE_TABLE(pci, ipr_pci_table);
 
+#ifdef CONFIG_PCI_ERR_RECOVERY
 static struct pci_error_handlers ipr_err_handler = {
 	.error_detected = ipr_eeh_error_detected,
 	.slot_reset = ipr_eeh_slot_reset,
 };
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 static struct pci_driver ipr_driver = {
 	.name = IPR_NAME,
@@ -6164,7 +6169,9 @@
 	.probe = ipr_probe,
 	.remove = ipr_remove,
 	.shutdown = ipr_shutdown,
+#ifdef CONFIG_PCI_ERR_RECOVERY
 	.err_handler = &ipr_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 };
 
 /**
Index: linux-2.6.14-mm1/drivers/pci/Kconfig
===================================================================
--- linux-2.6.14-mm1.orig/drivers/pci/Kconfig	2005-11-07 13:55:23.869498177 -0600
+++ linux-2.6.14-mm1/drivers/pci/Kconfig	2005-11-07 15:02:20.030668460 -0600
@@ -13,6 +13,21 @@
 
 	   If you don't know what to do here, say N.
 
+config PCI_ERR_RECOVERY
+	bool "PCI Error Recovery support"
+	depends on PCI
+	depends on PPC_PSERIES
+	default y
+	help
+	   PCI Error Recovery is a mechanism by which crashed/hung 
+		PCI adapters are automatically detected and rebooted without
+		otherwise disturbing the operation of the system.  Support
+		for this recovery requires special PCI bridge chips (some
+		PCI-E chips may have this support) as well as support in 
+		the device drivers (not all device drivers can handle this).
+
+	   When in doubt, say Y.
+
 config PCI_LEGACY_PROC
 	bool "Legacy /proc/pci interface"
 	depends on PCI
Index: linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-07 15:02:08.152337375 -0600
+++ linux-2.6.14-mm1/drivers/scsi/sym53c8xx_2/sym_glue.c	2005-11-07 15:02:20.034667898 -0600
@@ -763,6 +763,7 @@
  */
 static void sym_eh_timeout(u_long p) { __sym_eh_done((struct scsi_cmnd *)p, 1); }
 
+#ifdef CONFIG_PCI_ERR_RECOVERY
 static void sym_eeh_timeout(u_long p)
 {
 	struct sym_eh_wait *ep = (struct sym_eh_wait *) p;
@@ -781,6 +782,7 @@
 
 	complete(&ep->done);
 }
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 /*
  *  Generic method for our eh processing.
@@ -823,6 +825,7 @@
 	/* Try to proceed the operation we have been asked for */
 	sts = -1;
 
+#ifdef CONFIG_PCI_ERR_RECOVERY
 	/* We may be in an error condition because the PCI bus
 	 * went down. In this case, we need to wait until the
 	 * PCI bus is reset, the card is reset, and only then
@@ -850,6 +853,7 @@
 		}
 		np->s.io_reset_wait = NULL;
 	}
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 	switch(op) {
 	case SYM_EH_ABORT:
@@ -1971,6 +1975,7 @@
 }
 
 /* ------------- PCI Error Recovery infrastructure -------------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
 /** sym2_io_error_detected() is called when PCI error is detected */
 static pers_result_t sym2_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
 {
@@ -2021,6 +2026,7 @@
 	np->s.io_state = pci_channel_io_normal;
 	sym_eeh_done (np->s.io_reset_wait);
 }
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 /*
  * Driver host template.
@@ -2275,18 +2281,22 @@
 
 MODULE_DEVICE_TABLE(pci, sym2_id_table);
 
+#ifdef CONFIG_PCI_ERR_RECOVERY
 static struct pci_error_handlers sym2_err_handler = {
 	.error_detected = sym2_io_error_detected,
 	.slot_reset = sym2_io_slot_reset,
 	.resume = sym2_io_resume,
 };
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 static struct pci_driver sym2_driver = {
 	.name		= NAME53C8XX,
 	.id_table	= sym2_id_table,
 	.probe		= sym2_probe,
 	.remove		= __devexit_p(sym2_remove),
+#ifdef CONFIG_PCI_ERR_RECOVERY
 	.err_handler = &sym2_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 };
 
 static int __init sym2_init(void)
Index: linux-2.6.14-mm1/drivers/net/e100.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e100.c	2005-11-07 15:02:11.120920287 -0600
+++ linux-2.6.14-mm1/drivers/net/e100.c	2005-11-07 15:02:20.038667336 -0600
@@ -2466,6 +2466,7 @@
 
 
 /* ------------------ PCI Error Recovery infrastructure  -------------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
 /** e100_io_error_detected() is called when PCI error is detected */
 static pers_result_t e100_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 {
@@ -2532,6 +2533,7 @@
 	.slot_reset = e100_io_slot_reset,
 	.resume = e100_io_resume,
 };
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 
 static struct pci_driver e100_driver = {
@@ -2544,7 +2546,9 @@
 	.resume =       e100_resume,
 #endif
 	.shutdown =	e100_shutdown,
+#ifdef CONFIG_PCI_ERR_RECOVERY
 	.err_handler = &e100_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 };
 
 static int __init e100_init_module(void)
Index: linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/e1000/e1000_main.c	2005-11-07 15:02:12.811682734 -0600
+++ linux-2.6.14-mm1/drivers/net/e1000/e1000_main.c	2005-11-07 15:02:20.071662701 -0600
@@ -206,6 +206,7 @@
 void e1000_rx_schedule(void *data);
 #endif
 
+#ifdef CONFIG_PCI_ERR_RECOVERY
 static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state);
 static pers_result_t e1000_io_slot_reset(struct pci_dev *pdev);
 static void e1000_io_resume(struct pci_dev *pdev);
@@ -215,6 +216,7 @@
 	.slot_reset = e1000_io_slot_reset,
 	.resume = e1000_io_resume,
 };
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 /* Exported from other modules */
 
@@ -230,7 +232,9 @@
 	.suspend  = e1000_suspend,
 	.resume   = e1000_resume,
 #endif
+#ifdef CONFIG_PCI_ERR_RECOVERY
 	.err_handler = &e1000_err_handler,
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -4375,6 +4379,7 @@
 #endif
 
 /* --------------- PCI Error Recovery infrastructure ------------ */
+#ifdef CONFIG_PCI_ERR_RECOVERY
 /** e1000_io_error_detected() is called when PCI error is detected */
 static pers_result_t e1000_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 {
@@ -4457,5 +4462,6 @@
 	if(netif_running(netdev))
 		mod_timer(&adapter->watchdog_timer, jiffies);
 }
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 /* e1000_main.c */
Index: linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c
===================================================================
--- linux-2.6.14-mm1.orig/drivers/net/ixgb/ixgb_main.c	2005-11-07 15:02:14.779406268 -0600
+++ linux-2.6.14-mm1/drivers/net/ixgb/ixgb_main.c	2005-11-07 15:02:20.075662139 -0600
@@ -132,6 +132,7 @@
 static void ixgb_netpoll(struct net_device *dev);
 #endif
 
+#ifdef CONFIG_PCI_ERR_RECOVERY
 static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state);
 static pers_result_t ixgb_io_slot_reset (struct pci_dev *pdev);
 static void ixgb_io_resume (struct pci_dev *pdev);
@@ -141,6 +142,7 @@
 	.slot_reset = ixgb_io_slot_reset,
 	.resume = ixgb_io_resume,
 };
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 /* Exported from other modules */
 
@@ -151,8 +153,9 @@
 	.id_table = ixgb_pci_tbl,
 	.probe    = ixgb_probe,
 	.remove   = __devexit_p(ixgb_remove),
+#ifdef CONFIG_PCI_ERR_RECOVERY
 	.err_handler = &ixgb_err_handler,
-
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 };
 
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
@@ -2146,6 +2149,7 @@
 #endif
 
 /* -------------- PCI Error Recovery infrastructure ---------------- */
+#ifdef CONFIG_PCI_ERR_RECOVERY
 /** ixgb_io_error_detected() is called when PCI error is detected */
 static pers_result_t ixgb_io_error_detected (struct pci_dev *pdev, pci_channel_state_t state)
 {
@@ -2210,5 +2214,6 @@
 	memset(&adapter->stats, 0, sizeof(struct ixgb_hw_stats));
 	ixgb_update_stats(adapter);
 }
+#endif /* CONFIG_PCI_ERR_RECOVERY */
 
 /* ixgb_main.c */

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver
  2005-11-07 21:30             ` [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver linas
@ 2005-11-07 21:40               ` Brian King
  2005-11-07 22:03                 ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Brian King @ 2005-11-07 21:40 UTC (permalink / raw)
  To: linas
  Cc: Greg KH, linux-kernel, bluesmoke-devel, Paul Mackerras,
	linuxppc64-dev, linux-pci

linas wrote:
> +/** ipr_eeh_slot_reset - called when pci slot has been reset.
> + *
> + * This routine is called by the pci error recovery recovery
> + * code after the PCI slot has been reset, just before we
> + * should resume normal operations.
> + */
> +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev)
> +{
> +	unsigned long flags = 0;
> +	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
> +
> +	// pci_enable_device(pdev);
> +	// pci_set_master(pdev);

I assume you want remove these two lines... The pci config space
restore in ipr's reset handling should cover them.

> +	spin_lock_irqsave(ioa_cfg->host->host_lock, flags);
> +	_ipr_initiate_ioa_reset(ioa_cfg, ipr_reset_restore_cfg_space,
> +	                                 IPR_SHUTDOWN_NONE);
> +	spin_unlock_irqrestore(ioa_cfg->host->host_lock, flags);
> +
> +	return PERS_RESULT_RECOVERED;
> +}



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 21:37                 ` Greg KH
@ 2005-11-07 21:54                   ` Linus Torvalds
  2005-11-07 22:54                     ` Greg KH
  2005-11-07 22:43                   ` [PATCH 1/7]: PCI revised (3) " linas
  1 sibling, 1 reply; 131+ messages in thread
From: Linus Torvalds @ 2005-11-07 21:54 UTC (permalink / raw)
  To: Greg KH
  Cc: linas, linux-sparse, Paul Mackerras, linuxppc64-dev, johnrose,
	linux-pci, bluesmoke-devel, linux-kernel

On Mon, 7 Nov 2005, Greg KH wrote:
> 
> >  enum pci_channel_state {
> > -	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
> > -	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
> > -	pci_channel_io_perm_failure,	/* PCI card is dead */
> > +	pci_channel_io_normal = (__force pci_channel_state_t) 0,	/* I/O channel is in normal state */
> > +	pci_channel_io_frozen = (__force pci_channel_state_t) 1,	/* I/O to channel is blocked */
> > +	pci_channel_io_perm_failure = (__force pci_channel_state_t) 2,	/* PCI card is dead */
> >  };
> 
> You don't have to use an enum anymore, just use a #define.

The enum works fine, though, and has less namespace pollution than a 
#define, so sometimes an enum can be preferred.

HOWEVER. For sanity, if possible please avoid using the value "0". It's 
magic for __bitwise, in that a zero is always acceptable as a bitwise 
thing (which makes sense if you think of bitwise as being about bits: the 
zero representation is totally independent of any bit ordering).

So it's better to start counting from 1 if possible.

> Sparse developers, I see code in the kernel that that does both 
> (__force foo_t) and (foo_t __force).  Which one is correct?

sparse doesn't care. Whatever scans better for humans. Attributes like 
"force" parse the same way things like "const" and "volatile" parses, and 
while most people _tend_ to write "const int", it's not incorrect to write 
"int const". Same with "__attribute__((force))", aka __force.

			Linus

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver
  2005-11-07 21:40               ` Brian King
@ 2005-11-07 22:03                 ` linas
  0 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-07 22:03 UTC (permalink / raw)
  To: Brian King
  Cc: Greg KH, linux-kernel, bluesmoke-devel, Paul Mackerras,
	linuxppc64-dev, linux-pci

On Mon, Nov 07, 2005 at 03:40:32PM -0600, Brian King was heard to remark:
> linas wrote:
> > +/** ipr_eeh_slot_reset - called when pci slot has been reset.
> > + *
> > + * This routine is called by the pci error recovery recovery
> > + * code after the PCI slot has been reset, just before we
> > + * should resume normal operations.
> > + */
> > +static pers_result_t ipr_eeh_slot_reset(struct pci_dev *pdev)
> > +{
> > +	unsigned long flags = 0;
> > +	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
> > +
> > +	// pci_enable_device(pdev);
> > +	// pci_set_master(pdev);
> 
> I assume you want remove these two lines... The pci config space
> restore in ipr's reset handling should cover them.

Yes, I do. Its cruft left over from old test and debug cycles. :(


^ permalink raw reply	[flat|nested] 131+ messages in thread

* [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 21:37                 ` Greg KH
  2005-11-07 21:54                   ` Linus Torvalds
@ 2005-11-07 22:43                   ` linas
  2005-11-07 22:53                     ` Greg KH
  1 sibling, 1 reply; 131+ messages in thread
From: linas @ 2005-11-07 22:43 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-sparse, Paul Mackerras, linuxppc64-dev, johnrose,
	linux-pci, bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 01:37:29PM -0800, Greg KH was heard to remark:
> On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote:
> > +typedef int __bitwise pci_channel_state_t;
> 
> You don't have to use an enum anymore, just use a #define.

Per Linus's remarks about namespace pollution, I've kept the enums.

> > +typedef int __bitwise pers_result_t;
> 
> You should at least keep "pci" at the beginning to make it make
> more sense to people looking at it for the first time.

PCI_ERS and pci_ers, then.

I'm feeling like a blinkin' spammer, splatting out all these emails.

--linas

PCI Error Recovery: header file patch

Change enums and subroutine signatures to be strongly typed, per recent
discussion with GregKH. Also, change the acronym to the more unique, 
less generic "PCI-ERS" "PCI Error Recovery System".

Please apply.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

--
Index: linux-2.6.14-mm1/include/linux/pci.h
===================================================================
--- linux-2.6.14-mm1.orig/include/linux/pci.h	2005-11-07 13:55:28.000000000 -0600
+++ linux-2.6.14-mm1/include/linux/pci.h	2005-11-07 16:34:29.790592784 -0600
@@ -82,10 +82,12 @@
  *  the pci device.  If some PCI bus between here and the pci device
  *  has crashed or locked up, this info is reflected here.
  */
+typedef int __bitwise pci_channel_state_t;
+
 enum pci_channel_state {
-	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
-	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
-	pci_channel_io_perm_failure,	/* PCI card is dead */
+	pci_channel_io_normal = (__force pci_channel_state_t) 1,	/* I/O channel is in normal state */
+	pci_channel_io_frozen = (__force pci_channel_state_t) 2,	/* I/O to channel is blocked */
+	pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,	/* PCI card is dead */
 };
 
 /*
@@ -121,7 +123,7 @@
 					   this is D0-D3, D0 being fully functional,
 					   and D3 being off. */
 
-	enum pci_channel_state error_state;	/* current connectivity state */
+	pci_channel_state_t error_state;	/* current connectivity state */
 	struct	device	dev;		/* Generic device interface */
 
 	/* device is compatible with these IDs */
@@ -245,35 +247,46 @@
 };
 
 /* ---------------------------------------------------------------- */
-/** PCI error recovery infrastructure.  If a PCI device driver provides
+/** PCI Error Recovery System (PCI-ERS).  If a PCI device driver provides
  *  a set fof callbacks in struct pci_error_handlers, then that device driver
  *  will be notified of PCI bus errors, and will be driven to recovery
  *  when an error occurs.
  */
 
-enum pcierr_result {
-	PCIERR_RESULT_NONE = 0,		/* no result/none/not supported in device driver */
-	PCIERR_RESULT_CAN_RECOVER=1,	/* Device driver can recover without slot reset */
-	PCIERR_RESULT_NEED_RESET,	/* Device driver wants slot to be reset. */
-	PCIERR_RESULT_DISCONNECT,	/* Device has completely failed, is unrecoverable */
-	PCIERR_RESULT_RECOVERED,	/* Device driver is fully recovered and operational */
+typedef int __bitwise pci_ers_result_t;
+
+enum pci_ers_result {
+	/* no result/none/not supported in device driver */
+	PCI_ERS_RESULT_NONE = (__force pci_ers_result_t) 1,
+	
+	/* Device driver can recover without slot reset */
+	PCI_ERS_RESULT_CAN_RECOVER = (__force pci_ers_result_t) 2,
+	
+	/* Device driver wants slot to be reset. */
+	PCI_ERS_RESULT_NEED_RESET = (__force pci_ers_result_t) 3,
+	
+	/* Device has completely failed, is unrecoverable */
+	PCI_ERS_RESULT_DISCONNECT = (__force pci_ers_result_t) 4,
+	
+	/* Device driver is fully recovered and operational */
+	PCI_ERS_RESULT_RECOVERED = (__force pci_ers_result_t) 5,
 };
 
 /* PCI bus error event callbacks */
 struct pci_error_handlers
 {
 	/* PCI bus error detected on this device */
-	int (*error_detected)(struct pci_dev *dev,
-	                      enum pci_channel_state error);
+	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
+	                      pci_channel_state_t error);
 
 	/* MMIO has been re-enabled, but not DMA */
-	int (*mmio_enabled)(struct pci_dev *dev);
+	pci_ers_result_t (*mmio_enabled)(struct pci_dev *dev);
 
 	/* PCI Express link has been reset */
-	int (*link_reset)(struct pci_dev *dev);
+	pci_ers_result_t (*link_reset)(struct pci_dev *dev);
 
 	/* PCI slot has been reset */
-	int (*slot_reset)(struct pci_dev *dev);
+	pci_ers_result_t (*slot_reset)(struct pci_dev *dev);
 
 	/* Device driver may resume normal operations */
 	void (*resume)(struct pci_dev *dev);

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 22:43                   ` [PATCH 1/7]: PCI revised (3) " linas
@ 2005-11-07 22:53                     ` Greg KH
  2005-11-07 23:19                       ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Greg KH @ 2005-11-07 22:53 UTC (permalink / raw)
  To: linas
  Cc: linux-sparse, Paul Mackerras, linuxppc64-dev, johnrose,
	linux-pci, bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 04:43:38PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 01:37:29PM -0800, Greg KH was heard to remark:
> > On Mon, Nov 07, 2005 at 03:21:28PM -0600, linas wrote:
> > > +typedef int __bitwise pci_channel_state_t;
> > 
> > You don't have to use an enum anymore, just use a #define.
> 
> Per Linus's remarks about namespace pollution, I've kept the enums.

That's fine.

> > > +typedef int __bitwise pers_result_t;
> > 
> > You should at least keep "pci" at the beginning to make it make
> > more sense to people looking at it for the first time.
> 
> PCI_ERS and pci_ers, then.

Sounds good.

> I'm feeling like a blinkin' spammer, splatting out all these emails.

Care to just resend the whole series over again?  No "patch on top of
patch" stuff is needed here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised (2) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 21:54                   ` Linus Torvalds
@ 2005-11-07 22:54                     ` Greg KH
  0 siblings, 0 replies; 131+ messages in thread
From: Greg KH @ 2005-11-07 22:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linas, linux-sparse, Paul Mackerras, linuxppc64-dev, johnrose,
	linux-pci, bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 01:54:35PM -0800, Linus Torvalds wrote:
> On Mon, 7 Nov 2005, Greg KH wrote:
> > >  enum pci_channel_state {
> > > -	pci_channel_io_normal = 0,	/* I/O channel is in normal state */
> > > -	pci_channel_io_frozen = 1,	/* I/O to channel is blocked */
> > > -	pci_channel_io_perm_failure,	/* PCI card is dead */
> > > +	pci_channel_io_normal = (__force pci_channel_state_t) 0,	/* I/O channel is in normal state */
> > > +	pci_channel_io_frozen = (__force pci_channel_state_t) 1,	/* I/O to channel is blocked */
> > > +	pci_channel_io_perm_failure = (__force pci_channel_state_t) 2,	/* PCI card is dead */
> > >  };
> > 
> > You don't have to use an enum anymore, just use a #define.
> 
> The enum works fine, though, and has less namespace pollution than a 
> #define, so sometimes an enum can be preferred.

Good point.

> > Sparse developers, I see code in the kernel that that does both 
> > (__force foo_t) and (foo_t __force).  Which one is correct?
> 
> sparse doesn't care. Whatever scans better for humans. Attributes like 
> "force" parse the same way things like "const" and "volatile" parses, and 
> while most people _tend_ to write "const int", it's not incorrect to write 
> "int const". Same with "__attribute__((force))", aka __force.

Ok, thanks for clearing this up.

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 22:53                     ` Greg KH
@ 2005-11-07 23:19                       ` linas
  2005-11-08  2:43                         ` Greg KH
  0 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-07 23:19 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-sparse, Paul Mackerras, linuxppc64-dev, johnrose,
	linux-pci, bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 02:53:08PM -0800, Greg KH was heard to remark:
> > I'm feeling like a blinkin' spammer, splatting out all these emails.
> 
> Care to just resend the whole series over again?  No "patch on top of
> patch" stuff is needed here.

So that I can avoid that spammin' feelin' ... 

I'll send patches against -git10, then, so as to start with a clean
slate; unless you wanted something aginst -mm1?

"The whole series": do you want all 42 patches? Or just the seven
discussed today?

-----

In the series-of-42, the staging of some of the patches in the 
middle require simultaneous update to both the drivers/pci/hotplug
and the arch/powerpc/xxx; otherwise, build breaks result. I am
not sure how to handle that: the obvious solution is to split these
up... but that will probably result in a bigger series, and was
not a step I wanted to take unless someone asked...

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-07 20:41                   ` linas
  2005-11-07 20:46                     ` Greg KH
@ 2005-11-08  1:11                     ` Steven Rostedt
  2005-11-08  1:18                       ` Neil Brown
  2005-11-08 23:23                       ` typedefs and structs linas
  1 sibling, 2 replies; 131+ messages in thread
From: Steven Rostedt @ 2005-11-08  1:11 UTC (permalink / raw)
  To: linas
  Cc: linux-kernel, bluesmoke-devel, linux-pci, johnrose,
	linuxppc64-dev, Paul Mackerras, Greg KH

On Mon, 2005-11-07 at 14:41 -0600, linas wrote:

> 
> > > Also, "grep typedef include/linux/*" shows that many kernel device
> > > drivers use this convention.
> > 
> > They are wrong and should be fixed.
> 
> What, precisely, is wrong?

I can't seem to find it on google, but IIRC Linus stated that he didn't
want any more structures defined with typedefs.  If it is a structure,
simple keep it one, and don't use typedef to get rid of "struct".

This was for the simple reason, too many developers were passing
structures by value instead of by reference, just because they were
using a type that they didn't realize was a structure. And to make
things worse, these structures started to get bigger.

So in my every day programming, I switched to not typedef structures
anymore, and I even found some places that I passed structures by value
when it would have been much more efficient by reference.

The only exceptions that I can see where you typedef a structure is for
use with arch dependent types, like atomic_t. 

-- Steve

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-08  1:11                     ` Steven Rostedt
@ 2005-11-08  1:18                       ` Neil Brown
  2005-11-08 23:36                         ` typedefs and structs linas
  2005-12-16 13:09                         ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Denis Vlasenko
  2005-11-08 23:23                       ` typedefs and structs linas
  1 sibling, 2 replies; 131+ messages in thread
From: Neil Brown @ 2005-11-08  1:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linas, linux-kernel, bluesmoke-devel, linux-pci, johnrose,
	linuxppc64-dev, Paul Mackerras, Greg KH

On Monday November 7, rostedt@goodmis.org wrote:
> 
> This was for the simple reason, too many developers were passing
> structures by value instead of by reference, just because they were
> using a type that they didn't realize was a structure. And to make
> things worse, these structures started to get bigger.
> 

Another reason  for not using typedefs is that if you do, and you want
to refer to the structure in some other include file, you have to
#include the include file that devices the structure.
If you don't use typedefs, you can just say:

   struct foo;

and the compiler will happily wait for the complete definition later
(providing it doesn't need the size in the meanwhile). 
So avoiding typedef means that you can sometimes avoid excess
#includes, which means faster compiling.

NeilBrown

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: [PATCH 1/7]: PCI revised (3) [PATCH 16/42]: PCI:  PCI Error reporting callbacks
  2005-11-07 23:19                       ` linas
@ 2005-11-08  2:43                         ` Greg KH
  0 siblings, 0 replies; 131+ messages in thread
From: Greg KH @ 2005-11-08  2:43 UTC (permalink / raw)
  To: linas
  Cc: linux-sparse, Paul Mackerras, linuxppc64-dev, johnrose,
	linux-pci, bluesmoke-devel, linux-kernel

On Mon, Nov 07, 2005 at 05:19:55PM -0600, linas wrote:
> On Mon, Nov 07, 2005 at 02:53:08PM -0800, Greg KH was heard to remark:
> > > I'm feeling like a blinkin' spammer, splatting out all these emails.
> > 
> > Care to just resend the whole series over again?  No "patch on top of
> > patch" stuff is needed here.
> 
> So that I can avoid that spammin' feelin' ... 
> 
> I'll send patches against -git10, then, so as to start with a clean
> slate; unless you wanted something aginst -mm1?

-git10 would be great.

> "The whole series": do you want all 42 patches? Or just the seven
> discussed today?

Just the 7 discussed.  The others should go to their proper maintainers
(which I am not.)

> -----
> 
> In the series-of-42, the staging of some of the patches in the 
> middle require simultaneous update to both the drivers/pci/hotplug
> and the arch/powerpc/xxx; otherwise, build breaks result. I am
> not sure how to handle that: the obvious solution is to split these
> up... but that will probably result in a bigger series, and was
> not a step I wanted to take unless someone asked...

The drivers/pci/hotplug/ stuff only touches the rpaphp driver, right?
If so, I don't have a problem with Paul/Ben sending those on with the
other PPC64 changes to keep everything building properly for your arch.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08  1:11                     ` Steven Rostedt
  2005-11-08  1:18                       ` Neil Brown
@ 2005-11-08 23:23                       ` linas
  2005-11-08 23:33                         ` Steven Rostedt
                                           ` (2 more replies)
  1 sibling, 3 replies; 131+ messages in thread
From: linas @ 2005-11-08 23:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, bluesmoke-devel, linux-pci, johnrose, linuxppc64-dev

On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> 
> don't use typedef to get rid of "struct".
> 
> This was for the simple reason, too many developers were passing
> structures by value instead of by reference, just because they were
> using a type that they didn't realize was a structure. 

That's a rather bizarre mistake to make, since, in order to 
access a values in such a beast, you have to use a dot . instead 
of an arrow -> and so it hits ou in the face that you passed a value
instead of a reference.

----
Off-topic: There's actually a neat little trick in C++ that can 
help avoid accidentally passing null pointers.  One can declare 
function declarations as:

  int func (sturct blah &v) {
    v.a ++; 
    return v.b;
  }

The ampersand says "pass argument by reference (so as to get arg passing
efficiency) but force coder to write code as if they were passing by value"
As a result, it gets difficult to pass null pointers (for reasons
similar to the difficulty of passing null pointers in Java (and yes,
I loathe Java, sorry to subject you to that))  Anyway, that's a C++ trick 
only; I wish it was in C so I could experiment more and find out if I 
like it or hate it.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08 23:23                       ` typedefs and structs linas
@ 2005-11-08 23:33                         ` Steven Rostedt
  2005-11-09  9:22                           ` Bernd Petrovitsch
  2005-11-08 23:57                         ` Kyle Moffett
  2005-11-08 23:57                         ` David Gibson
  2 siblings, 1 reply; 131+ messages in thread
From: Steven Rostedt @ 2005-11-08 23:33 UTC (permalink / raw)
  To: linas; +Cc: linux-kernel, bluesmoke-devel, linux-pci, johnrose, linuxppc64-dev

On Tue, 2005-11-08 at 17:23 -0600, linas wrote:
> On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> > On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> > 
> > don't use typedef to get rid of "struct".
> > 
> > This was for the simple reason, too many developers were passing
> > structures by value instead of by reference, just because they were
> > using a type that they didn't realize was a structure. 
> 
> That's a rather bizarre mistake to make, since, in order to 
> access a values in such a beast, you have to use a dot . instead 
> of an arrow -> and so it hits ou in the face that you passed a value
> instead of a reference.

It happens when you access the variable via macros and other routines
that you notice that takes and address of the variable, so you just pass
in the address of the current local variable.

> 
> ----
> Off-topic: There's actually a neat little trick in C++ that can 
> help avoid accidentally passing null pointers.  One can declare 
> function declarations as:
> 
>   int func (sturct blah &v) {
>     v.a ++; 
>     return v.b;
>   }
> 
> The ampersand says "pass argument by reference (so as to get arg passing
> efficiency) but force coder to write code as if they were passing by value"
> As a result, it gets difficult to pass null pointers (for reasons
> similar to the difficulty of passing null pointers in Java (and yes,
> I loathe Java, sorry to subject you to that))  Anyway, that's a C++ trick 
> only; I wish it was in C so I could experiment more and find out if I 
> like it or hate it.
> 

Actually, the true pass by reference (not by pointer) is one of the
things that C++ has, that I wish C had.

-- Steve



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08  1:18                       ` Neil Brown
@ 2005-11-08 23:36                         ` linas
  2005-12-16 13:09                         ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Denis Vlasenko
  1 sibling, 0 replies; 131+ messages in thread
From: linas @ 2005-11-08 23:36 UTC (permalink / raw)
  To: Neil Brown
  Cc: Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	johnrose, linuxppc64-dev

On Tue, Nov 08, 2005 at 12:18:42PM +1100, Neil Brown was heard to remark:
> 
> Another reason  for not using typedefs is that if you do, and you want
> to refer to the structure in some other include file, you have to
> #include the include file that devices the structure.
> If you don't use typedefs, you can just say:
> 
>    struct foo;
> 
> and the compiler will happily wait for the complete definition later
> (providing it doesn't need the size in the meanwhile). 

Yes, this is the "forward declaration" problem I was refering to. 
Its unavoidable if structs have circular references to each other.

However, I've learned, by experience, several things by trying to
eliminate such forward declarations (and the related #include hell):

-- Its really, really hard, and right in the middle, you think,
   "gosh this is a stupid idea, why am I bothering?"

-- When you get done, you think: "wow this new code structure
   is so insanely better than the old code! The guy who wrote
   the old code should be hung from a yardarm as an example!"

So having a mechanism that prevents coders from declaring 
"struct foo" whenever they feel like it can be a good thing.
Of course, your milage may vary.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08 23:23                       ` typedefs and structs linas
  2005-11-08 23:33                         ` Steven Rostedt
@ 2005-11-08 23:57                         ` Kyle Moffett
  2005-11-09  0:30                           ` linas
  2005-11-08 23:57                         ` David Gibson
  2 siblings, 1 reply; 131+ messages in thread
From: Kyle Moffett @ 2005-11-08 23:57 UTC (permalink / raw)
  To: linas
  Cc: Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	johnrose, linuxppc64-dev

On Nov 8, 2005, at 18:23:27, linas wrote:
> Off-topic: There's actually a neat little trick in C++ that can  
> help avoid accidentally passing null pointers.  One can declare  
> function declarations as:
>
>   int func (sturct blah &v) {
>     v.a ++;
>     return v.b;
>   }
>
> The ampersand says "pass argument by reference (so as to get arg  
> passing efficiency) but force coder to write code as if they were  
> passing by value" As a result, it gets difficult to pass null  
> pointers (for reasons similar to the difficulty of passing null  
> pointers in Java (and yes, I loathe Java, sorry to subject you to  
> that))  Anyway, that's a C++ trick  only; I wish it was in C so I  
> could experiment more and find out if I like it or hate it.

That technique tends to cause more problems than it solves.  If I  
write the following code:

struct foo the_leftmost_foo = get_leftmost_foo();
do_some_stuff(the_leftmost_foo);

How do I know what it is going to do?  Will it modify  
the_leftmost_foo, or is it a pass-by-value as it appears?  This is  
just as bad as defining a macro some_macro(foo,bar) that does (foo =  
bar), it's _really_ hard to tell what it does, especially when you  
aren't all that familiar with the code.  A much better solution is this:

void do_some_stuff(struct foo *the_foo) __attribute__((__nonnull__(1)));

do_some_stuff(&the_leftmost_foo);

That ensures that the first argument cannot be explicitly passed as  
null, while still being quite obvious to the programmer what it's doing.

Cheers,
Kyle Moffett

--
They _will_ find opposing experts to say it isn't, if you push hard  
enough the wrong way.  Idiots with a PhD aren't hard to buy.
   -- Rob Landley




^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08 23:23                       ` typedefs and structs linas
  2005-11-08 23:33                         ` Steven Rostedt
  2005-11-08 23:57                         ` Kyle Moffett
@ 2005-11-08 23:57                         ` David Gibson
  2005-11-09  0:13                           ` Zan Lynx
  2 siblings, 1 reply; 131+ messages in thread
From: David Gibson @ 2005-11-08 23:57 UTC (permalink / raw)
  To: linas
  Cc: Steven Rostedt, linuxppc64-dev, linux-pci, linux-kernel, bluesmoke-devel

On Tue, Nov 08, 2005 at 05:23:27PM -0600, Linas Vepstas wrote:
> On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> > On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> > 
> > don't use typedef to get rid of "struct".
> > 
> > This was for the simple reason, too many developers were passing
> > structures by value instead of by reference, just because they were
> > using a type that they didn't realize was a structure. 
> 
> That's a rather bizarre mistake to make, since, in order to 
> access a values in such a beast, you have to use a dot . instead 
> of an arrow -> and so it hits ou in the face that you passed a value
> instead of a reference.
> 
> ----
> Off-topic: There's actually a neat little trick in C++ that can 
> help avoid accidentally passing null pointers.  One can declare 
> function declarations as:
> 
>   int func (sturct blah &v) {
>     v.a ++; 
>     return v.b;
>   }
> 
> The ampersand says "pass argument by reference (so as to get arg passing
> efficiency) but force coder to write code as if they were passing by value"
> As a result, it gets difficult to pass null pointers (for reasons
> similar to the difficulty of passing null pointers in Java (and yes,
> I loathe Java, sorry to subject you to that))  Anyway, that's a C++ trick 
> only; I wish it was in C so I could experiment more and find out if I 
> like it or hate it.

I hate it: it obscures the fact that it's a pass-by-reference at the
callsite, which is useful information.  Although this is, admittedly,
the least confusing use of C++ reference types.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08 23:57                         ` David Gibson
@ 2005-11-09  0:13                           ` Zan Lynx
  2005-11-09  0:42                             ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Zan Lynx @ 2005-11-09  0:13 UTC (permalink / raw)
  To: David Gibson
  Cc: linas, Steven Rostedt, linuxppc64-dev, linux-pci, linux-kernel,
	bluesmoke-devel

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

On Wed, 2005-11-09 at 10:57 +1100, David Gibson wrote:
> On Tue, Nov 08, 2005 at 05:23:27PM -0600, Linas Vepstas wrote:
[snip]
> > The ampersand says "pass argument by reference (so as to get arg passing
> > efficiency) but force coder to write code as if they were passing by value"
> > As a result, it gets difficult to pass null pointers (for reasons
> > similar to the difficulty of passing null pointers in Java (and yes,
> > I loathe Java, sorry to subject you to that))  Anyway, that's a C++ trick 
> > only; I wish it was in C so I could experiment more and find out if I 
> > like it or hate it.
> 
> I hate it: it obscures the fact that it's a pass-by-reference at the
> callsite, which is useful information.  Although this is, admittedly,
> the least confusing use of C++ reference types.

I agree with you about that one.  It's yet another thing for C
programmers to have to learn to watch for C++ doing behind your back.

However, it isn't any worse than having an ordinary C pointer to some
struct.  If the pointer was passed to the current function from above,
and you're passing it to another function below, you really don't know
what's going to happen to the structure unless you go look.  Just like
the C++ reference, the C pointer doesn't get an address-of operator to
remind you.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08 23:57                         ` Kyle Moffett
@ 2005-11-09  0:30                           ` linas
  2005-11-09  0:37                             ` Douglas McNaught
  0 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-09  0:30 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci, linuxppc64-dev

On Tue, Nov 08, 2005 at 06:57:11PM -0500, Kyle Moffett was heard to remark:
> On Nov 8, 2005, at 18:23:27, linas wrote:
> >Off-topic: There's actually a neat little trick in C++ that can  
> >help avoid accidentally passing null pointers.  One can declare  
> >function declarations as:
> >
> >  int func (sturct blah &v) {
> >    v.a ++;
> >    return v.b;
> >  }
> >
> >The ampersand says "pass argument by reference (so as to get arg  
> >passing efficiency) but force coder to write code as if they were  
> >passing by value" As a result, it gets difficult to pass null  
> >pointers (for reasons similar to the difficulty of passing null  
> >pointers in Java (and yes, I loathe Java, sorry to subject you to  
> >that))  Anyway, that's a C++ trick  only; I wish it was in C so I  
> >could experiment more and find out if I like it or hate it.
> 
> That technique tends to cause more problems than it solves.  If I  
> write the following code:
> 
> struct foo the_leftmost_foo = get_leftmost_foo();
> do_some_stuff(the_leftmost_foo);
> 
> How do I know what it is going to do?  

It depends on how do_some_stuff() was declared. If its declared as

   do_some_stuff (struct foo &x)

then it will be a pass by reference.

> A much better solution is this:
> 
> void do_some_stuff(struct foo *the_foo) __attribute__((__nonnull__(1)));

Think of it as "syntactic sugar": the compiler "does the right thing"
without all the grungy extra markup such as __atribute. 

(Remember that at the dawn of time, C++ was just a bunch of pre-processor
markup that did nothing but hide grunge like __attribute__((whatever))
from the programmer. Only later did it become a language. Doing markup
like what you're suggesting is only a tiny step away from inventing a new
language, esp if you come up with some clever, unobtrusive markup for
it.)

> That ensures that the first argument cannot be explicitly passed as  
> null, 

Well, this misses the point. No one intentionally passes null pointers.
Its just that "shit happens". Pass-by-reference changes your coding
style. You tend to alloc on stack instead of malloc.  And then, since
its on stack, you know it would be very wrong to keep a pointer to it, 
and so you don't, you design code differently.  Usually, you discover 
you never really needed to hold a pointer to it anyway; you just did so
out of some ingrained habit.

And since its on stack, you can't leak memory, you don't need to 
reference count it. Much fewer mallocs & frees, so less likely to have
errors there. Better performance, and less memory fragmentation, for
what that's worth.

I dunno, I did this once on a larger, year-long project, and rather
liked it (I otherwise don't much like C++, since people tend to use
it in bad, horrible ways). I won't say this is the greatest 
coding style in the world, but it does change the way you think about 
designing code, mostly for the better.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:30                           ` linas
@ 2005-11-09  0:37                             ` Douglas McNaught
  2005-11-09  0:48                               ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Douglas McNaught @ 2005-11-09  0:37 UTC (permalink / raw)
  To: linas
  Cc: Kyle Moffett, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linux-pci, linuxppc64-dev

linas <linas@austin.ibm.com> writes:

> On Tue, Nov 08, 2005 at 06:57:11PM -0500, Kyle Moffett was heard to remark:

>> That technique tends to cause more problems than it solves.  If I  
>> write the following code:
>> 
>> struct foo the_leftmost_foo = get_leftmost_foo();
>> do_some_stuff(the_leftmost_foo);
>> 
>> How do I know what it is going to do?  
>
> It depends on how do_some_stuff() was declared. If its declared as
>
>    do_some_stuff (struct foo &x)
>
> then it will be a pass by reference.

Yeah, but if you're trying to read that code, you have to go look up
the declaration to figure out whether it might affect 'foo' or not.
And if you get it wrong, you get silent data corruption.

I'd rather pass a pointer explicitly and crash with a segfault if
someone passes NULL--at least then it's pellucidly clear what went
wrong.

-Doug

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:13                           ` Zan Lynx
@ 2005-11-09  0:42                             ` linas
  2005-11-09  9:25                               ` Bernd Petrovitsch
  0 siblings, 1 reply; 131+ messages in thread
From: linas @ 2005-11-09  0:42 UTC (permalink / raw)
  To: Zan Lynx
  Cc: David Gibson, Steven Rostedt, linuxppc64-dev, linux-pci,
	linux-kernel, bluesmoke-devel

On Tue, Nov 08, 2005 at 05:13:48PM -0700, Zan Lynx was heard to remark:
> On Wed, 2005-11-09 at 10:57 +1100, David Gibson wrote:
> > 
> > I hate it: it obscures the fact that it's a pass-by-reference at the
> > callsite, which is useful information.  Although this is, admittedly,
> > the least confusing use of C++ reference types.
> 
> I agree with you about that one.  It's yet another thing for C
> programmers to have to learn to watch for C++ doing behind your back.

I think you're rushing to judgement on something you've never tried. 
It fundamentally changes coding style; you'd have to try it on some 
mid-size project for at least a few months or longer to get into the
mindset.  To make it all work, you also have to do other things, like 
avoid mallocs and allocing on stack, which forces major changes of 
style (because of the lifetime of things on stack). If you don't change 
style to go with it, then you'll just end up in debug hell, in which
case you'd be right: it would be a (very) bad idea.

(Disclaimer: I've moved away from C++ because of all the other
opportunities for misuse that it offers and encourages.)

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:37                             ` Douglas McNaught
@ 2005-11-09  0:48                               ` linas
  2005-11-09  0:59                                 ` Douglas McNaught
  2005-11-09  1:51                                 ` Kyle Moffett
  0 siblings, 2 replies; 131+ messages in thread
From: linas @ 2005-11-09  0:48 UTC (permalink / raw)
  To: Douglas McNaught
  Cc: Kyle Moffett, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linux-pci, linuxppc64-dev

On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark:
> 
> Yeah, but if you're trying to read that code, you have to go look up
> the declaration to figure out whether it might affect 'foo' or not.
> And if you get it wrong, you get silent data corruption.

No, that is not what "pass by reference" means. You are thinking of
"const", maybe, or "pass by value"; this is neither.  The arg is not 
declared const, the subroutine can (and usually will) modify the contents 
of the structure, and so the caller will be holding a modified structure
when the callee returns (just like it would if a pointer was passed).

--linas



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:48                               ` linas
@ 2005-11-09  0:59                                 ` Douglas McNaught
  2005-11-09  2:14                                   ` Dmitry Torokhov
  2005-11-09  1:51                                 ` Kyle Moffett
  1 sibling, 1 reply; 131+ messages in thread
From: Douglas McNaught @ 2005-11-09  0:59 UTC (permalink / raw)
  To: linas
  Cc: Kyle Moffett, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linux-pci, linuxppc64-dev

linas <linas@austin.ibm.com> writes:

> On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark:
>> 
>> Yeah, but if you're trying to read that code, you have to go look up
>> the declaration to figure out whether it might affect 'foo' or not.
>> And if you get it wrong, you get silent data corruption.
>
> No, that is not what "pass by reference" means. You are thinking of
> "const", maybe, or "pass by value"; this is neither.  The arg is not 
> declared const, the subroutine can (and usually will) modify the contents 
> of the structure, and so the caller will be holding a modified structure
> when the callee returns (just like it would if a pointer was passed).

Right.  My point is only that it's not clear from looking at the call
site whether a struct passed by reference will be modified by the
callee (some people pass by reference just for "efficiency").  And if
the called function modifies the data without the caller's knowledge,
it leads to obscure bugs.  Whereas if you pass a pointer, it's
immediately clear that the called function can modify the pointed-to
object.

-Doug

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:48                               ` linas
  2005-11-09  0:59                                 ` Douglas McNaught
@ 2005-11-09  1:51                                 ` Kyle Moffett
  2005-11-09 10:16                                   ` J.A. Magallon
  1 sibling, 1 reply; 131+ messages in thread
From: Kyle Moffett @ 2005-11-09  1:51 UTC (permalink / raw)
  To: linas
  Cc: Douglas McNaught, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linux-pci, linuxppc64-dev

On Nov 8, 2005, at 19:48:08, linas wrote:
> On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was  
> heard to remark:
>> Yeah, but if you're trying to read that code, you have to go look  
>> up the declaration to figure out whether it might affect 'foo' or  
>> not. And if you get it wrong, you get silent data corruption.
>
> No, that is not what "pass by reference" means. You are thinking of  
> "const", maybe, or "pass by value"; this is neither.  The arg is  
> not declared const, the subroutine can (and usually will) modify  
> the contents of the structure, and so the caller will be holding a  
> modified structure when the callee returns (just like it would if a  
> pointer was passed).

Pass by value in C:
do_some_stuff(arg1, arg2);

Pass by reference in C:
do_some_stuff(&arg1, &arg2);

This is very obvious what it does.  The compiler does type-checks to  
make sure you don't get it wrong.  There are tools to check stack  
usage of functions too.  This is inherently obvious what the code  
does without looking at a completely different file where the  
function is defined.


Pass by value in C++:
do_some_stuff(arg1, arg2);

Pass by reference in C++:
do_some_stuff(arg1, arg2);

This is C++ being clever and hiding stuff from the programmer, which  
is Not Good(TM) for a kernel.  C++ may be an excellent language for  
userspace programmers (I say "may" here because some disagree,  
including myself), however, many of the features are extremely  
problematic for a kernel.


Cheers,
Kyle Moffett

--
Debugging is twice as hard as writing the code in the first place.   
Therefore, if you write the code as cleverly as possible, you are, by  
definition, not smart enough to debug it.
   -- Brian Kernighan



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:59                                 ` Douglas McNaught
@ 2005-11-09  2:14                                   ` Dmitry Torokhov
  0 siblings, 0 replies; 131+ messages in thread
From: Dmitry Torokhov @ 2005-11-09  2:14 UTC (permalink / raw)
  To: Douglas McNaught
  Cc: linas, Kyle Moffett, Steven Rostedt, linux-kernel,
	bluesmoke-devel, linux-pci, linuxppc64-dev

On Tuesday 08 November 2005 19:59, Douglas McNaught wrote:
> linas <linas@austin.ibm.com> writes:
> 
> > On Tue, Nov 08, 2005 at 07:37:20PM -0500, Douglas McNaught was heard to remark:
> >> 
> >> Yeah, but if you're trying to read that code, you have to go look up
> >> the declaration to figure out whether it might affect 'foo' or not.
> >> And if you get it wrong, you get silent data corruption.
> >
> > No, that is not what "pass by reference" means. You are thinking of
> > "const", maybe, or "pass by value"; this is neither.  The arg is not 
> > declared const, the subroutine can (and usually will) modify the contents 
> > of the structure, and so the caller will be holding a modified structure
> > when the callee returns (just like it would if a pointer was passed).
> 
> Right.  My point is only that it's not clear from looking at the call
> site whether a struct passed by reference will be modified by the
> callee (some people pass by reference just for "efficiency").  And if
> the called function modifies the data without the caller's knowledge,
> it leads to obscure bugs.  Whereas if you pass a pointer, it's
> immediately clear that the called function can modify the pointed-to
> object.
>

A structure is almost never passed by value, no matter whether it is C
or C++. So both languages require you either use descriptive naming or
look up declaration/implementation:

C:
	struct str {
		char buf[1024];
		int count;
	};
	struct str s;

	do_something_with_s(&s);
	do_something_else_with_s(&s);

Which one modufies s?

-- 
Dmitry

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-08 23:33                         ` Steven Rostedt
@ 2005-11-09  9:22                           ` Bernd Petrovitsch
  0 siblings, 0 replies; 131+ messages in thread
From: Bernd Petrovitsch @ 2005-11-09  9:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linas, linux-kernel, bluesmoke-devel, linux-pci, johnrose,
	linuxppc64-dev

On Tue, 2005-11-08 at 18:33 -0500, Steven Rostedt wrote:
> On Tue, 2005-11-08 at 17:23 -0600, linas wrote:
> > On Mon, Nov 07, 2005 at 08:11:13PM -0500, Steven Rostedt was heard to remark:
> > > On Mon, 2005-11-07 at 14:41 -0600, linas wrote:
> > > 
> > > don't use typedef to get rid of "struct".
> > > 
> > > This was for the simple reason, too many developers were passing
> > > structures by value instead of by reference, just because they were
> > > using a type that they didn't realize was a structure. 
> > 
> > That's a rather bizarre mistake to make, since, in order to 
> > access a values in such a beast, you have to use a dot . instead 
> > of an arrow -> and so it hits ou in the face that you passed a value
> > instead of a reference.

And for every access of a field with a . you also look if it is not a
locally declared (small) struct?

> It happens when you access the variable via macros and other routines
> that you notice that takes and address of the variable, so you just pass
> in the address of the current local variable.
> > ----
> > Off-topic: There's actually a neat little trick in C++ that can 
> > help avoid accidentally passing null pointers.  One can declare 

And if you want a NULL-pointer equivalent, you declared a defined
null_blah object just to have a reference.
I've seen that often enough.
If want to avoid accidental NULL pointers, use "splint" or similar
tools. Or add an BUG_ON().

> > function declarations as:
> > 
> >   int func (sturct blah &v) {
> >     v.a ++; 
> >     return v.b;
> >   }
> > 
> > The ampersand says "pass argument by reference (so as to get arg passing
> > efficiency) but force coder to write code as if they were passing by value"
> > As a result, it gets difficult to pass null pointers (for reasons
> > similar to the difficulty of passing null pointers in Java (and yes,

See above for NULL-pointer equivalents.

> > I loathe Java, sorry to subject you to that))  Anyway, that's a C++ trick 
> > only; I wish it was in C so I could experiment more and find out if I 

No, it's not a trick butt an ordinary language feature.
And no, it was already in several Pascal's decades ago.

> > like it or hate it.
> 
> Actually, the true pass by reference (not by pointer) is one of the
> things that C++ has, that I wish C had.

No, you probably don't because if you forget one of the & in a call
chain (or even worse: it is removed for whatever bizarre reason), you
might get interesting bugs to hunt. And C++ is also much more creative
with temporaries which are simply thrown away afterwards.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  0:42                             ` linas
@ 2005-11-09  9:25                               ` Bernd Petrovitsch
  0 siblings, 0 replies; 131+ messages in thread
From: Bernd Petrovitsch @ 2005-11-09  9:25 UTC (permalink / raw)
  To: linas
  Cc: Zan Lynx, David Gibson, Steven Rostedt, linuxppc64-dev,
	linux-pci, linux-kernel, bluesmoke-devel

On Tue, 2005-11-08 at 18:42 -0600, linas wrote:
[ C vs C++ ]
> It fundamentally changes coding style; you'd have to try it on some 
> mid-size project for at least a few months or longer to get into the
> mindset.  To make it all work, you also have to do other things, like 
> avoid mallocs and allocing on stack, which forces major changes of 
> style (because of the lifetime of things on stack). If you don't change 

The lifetime of the stack is AFAIK the same on C and C++. So there can't
be a significant difference.

> style to go with it, then you'll just end up in debug hell, in which
> case you'd be right: it would be a (very) bad idea.
> 
> (Disclaimer: I've moved away from C++ because of all the other
> opportunities for misuse that it offers and encourages.)

You that opportunities in all programming languages - in some more (perl
being probably the leader here), in some less (I don't know one).

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09  1:51                                 ` Kyle Moffett
@ 2005-11-09 10:16                                   ` J.A. Magallon
  2005-11-09 16:22                                     ` Vadim Lobanov
  0 siblings, 1 reply; 131+ messages in thread
From: J.A. Magallon @ 2005-11-09 10:16 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: linas, Douglas McNaught, Steven Rostedt, linux-kernel,
	bluesmoke-devel, linux-pci, linuxppc64-dev

[-- Attachment #1: Type: text/plain, Size: 2194 bytes --]

On Tue, 8 Nov 2005 20:51:25 -0500, Kyle Moffett <mrmacman_g4@mac.com> wrote:

> 
> Pass by value in C:
> do_some_stuff(arg1, arg2);
> 
> Pass by reference in C:
> do_some_stuff(&arg1, &arg2);
> 
> This is very obvious what it does.  The compiler does type-checks to  
> make sure you don't get it wrong.  There are tools to check stack  
> usage of functions too.  This is inherently obvious what the code  
> does without looking at a completely different file where the  
> function is defined.
> 
> 
> Pass by value in C++:
> do_some_stuff(arg1, arg2);
> 
> Pass by reference in C++:
> do_some_stuff(arg1, arg2);
> 
> This is C++ being clever and hiding stuff from the programmer, which  
> is Not Good(TM) for a kernel.  C++ may be an excellent language for  
> userspace programmers (I say "may" here because some disagree,  
> including myself), however, many of the features are extremely  
> problematic for a kernel.
> 

Why is it not good for kernel ?
You want to pass an struct to a function in the best way you can.
Reference just pases a pointer instead of copying, but you don't
realize.
If you want the funcion to be able to modify the struct, code it as

void do_some_stuff(T& arg1,T&  arg2)

If you DO NOT want the funcion to be able to modify the struct, code it as

void do_some_stuff(const T& arg1,const T& arg2)
This is far better than in C,. because you get the benefits from
reference pass without the problems of accidental modification of
pointer contents. And get rid of arrows -> ;).

If the function modifies the struct it should be obvious from its name,
not depending if you put an & in the call or not.
And you stop worrying about argument pass methods.
The person who programs the function decides and can even change it without
you user even noticing.
And gcc does nice optimizations when you mix const& and inlining...

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 10:16                                   ` J.A. Magallon
@ 2005-11-09 16:22                                     ` Vadim Lobanov
  2005-11-09 19:20                                       ` linas
  0 siblings, 1 reply; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-09 16:22 UTC (permalink / raw)
  To: J.A. Magallon
  Cc: Kyle Moffett, linas, Douglas McNaught, Steven Rostedt,
	linux-kernel, bluesmoke-devel, linux-pci, linuxppc64-dev

On Wed, 9 Nov 2005, J.A. Magallon wrote:

> On Tue, 8 Nov 2005 20:51:25 -0500, Kyle Moffett <mrmacman_g4@mac.com> wrote:
>
> >
> > Pass by value in C:
> > do_some_stuff(arg1, arg2);
> >
> > Pass by reference in C:
> > do_some_stuff(&arg1, &arg2);
> >
> > This is very obvious what it does.  The compiler does type-checks to
> > make sure you don't get it wrong.  There are tools to check stack
> > usage of functions too.  This is inherently obvious what the code
> > does without looking at a completely different file where the
> > function is defined.
> >
> >
> > Pass by value in C++:
> > do_some_stuff(arg1, arg2);
> >
> > Pass by reference in C++:
> > do_some_stuff(arg1, arg2);
> >
> > This is C++ being clever and hiding stuff from the programmer, which
> > is Not Good(TM) for a kernel.  C++ may be an excellent language for
> > userspace programmers (I say "may" here because some disagree,
> > including myself), however, many of the features are extremely
> > problematic for a kernel.
> >
>
> Why is it not good for kernel ?
> You want to pass an struct to a function in the best way you can.
> Reference just pases a pointer instead of copying, but you don't
> realize.
> If you want the funcion to be able to modify the struct, code it as
>
> void do_some_stuff(T& arg1,T&  arg2)
>
> If you DO NOT want the funcion to be able to modify the struct, code it as
>
> void do_some_stuff(const T& arg1,const T& arg2)

A diligent C programmer would write this as follows:
	void do_some_stuff (struct T * a, struct T * b);
versus
	void do_more_stuff (const struct T * a, const struct T * b);
So I don't see C++ winning at all here.

> This is far better than in C,. because you get the benefits from
> reference pass without the problems of accidental modification of
> pointer contents. And get rid of arrows -> ;).
>
> If the function modifies the struct it should be obvious from its name,
> not depending if you put an & in the call or not.
> And you stop worrying about argument pass methods.

I think I'll call this my rule #1:
The moment you stop worrying about something is the moment it bites you
in the butt. :-) Much firsthand experience.

> The person who programs the function decides and can even change it without
> you user even noticing.

And if the caller is passing in something that's not meant to be
modified, then the modification causes much badness. Happens with both
languages, too.

> And gcc does nice optimizations when you mix const& and inlining...

As far as I know, nothing stops GCC from doing the exact same
optimizations in the function prototypes given above.

>
> --
> J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
> werewolf!able!es                         \         It's better when it's free
> Mandriva Linux release 2006.1 (Cooker) for i586
> Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))
>

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 16:22                                     ` Vadim Lobanov
@ 2005-11-09 19:20                                       ` linas
  2005-11-09 19:36                                         ` thockin
                                                           ` (2 more replies)
  0 siblings, 3 replies; 131+ messages in thread
From: linas @ 2005-11-09 19:20 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: J.A. Magallon, Kyle Moffett, Douglas McNaught, Steven Rostedt,
	linux-kernel, bluesmoke-devel, linux-pci, linuxppc64-dev

On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> 
> > void do_some_stuff(T& arg1,T&  arg2)
> 
> A diligent C programmer would write this as follows:
> 	void do_some_stuff (struct T * a, struct T * b);
> So I don't see C++ winning at all here.

I guess the real point that I'd wanted to make, and seems
to have gotten lost, was that by avoiding using pointers, 
you end up designing code in a very different way, and you
can find out that often/usually, you don't need structs
filled with a zoo of pointers.

Minimizing pointers is good: less ref counting is needed,
fewer mallocs are needed, fewer locks are needed 
(because of local/private scope!!), and null pointer 
deref errors are less likely. 

There are even performance implications: on modern CPU's
there's a very long pipeline to memory (hundreds of cycles 
for a cache miss! Really! Worse if you have run out of 
TLB entries!). So walking a long linked list chasing 
pointers can really really hurt performance.

By using refs instead of pointers, it helps you focus 
on the issue of "do I really need to store this pointer 
somewhere? Will I really need it later, or can I be done 
with it now?".

I don't know if the idea of "using fewer pointers" can
actually be carried out in the kernel. For starters,
the stack is way too short to be able to put much on it.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 19:20                                       ` linas
@ 2005-11-09 19:36                                         ` thockin
  2005-11-09 19:38                                           ` linas
  2005-11-09 20:26                                         ` linux-os (Dick Johnson)
  2005-11-09 21:43                                         ` Vadim Lobanov
  2 siblings, 1 reply; 131+ messages in thread
From: thockin @ 2005-11-09 19:36 UTC (permalink / raw)
  To: linas
  Cc: Vadim Lobanov, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev

On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote:
> I guess the real point that I'd wanted to make, and seems
> to have gotten lost, was that by avoiding using pointers, 
> you end up designing code in a very different way, and you
> can find out that often/usually, you don't need structs
> filled with a zoo of pointers.

Umm, references are implemented as pointers.  Instead of a "zoo of
pointers" you have a "zoo of references".  No functional difference.

> Minimizing pointers is good: less ref counting is needed,
> fewer mallocs are needed, fewer locks are needed 
> (because of local/private scope!!), and null pointer 
> deref errors are less likely. 

Not true at all!  If you're storing references you absolutley still need
reference counting.  Allocation non-trivial things on the stack is Bad
Idea in kernel land.


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 19:36                                         ` thockin
@ 2005-11-09 19:38                                           ` linas
  2005-11-09 20:39                                             ` thockin
  2005-11-09 20:55                                             ` Matthew Wilcox
  0 siblings, 2 replies; 131+ messages in thread
From: linas @ 2005-11-09 19:38 UTC (permalink / raw)
  To: thockin
  Cc: Vadim Lobanov, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev

On Wed, Nov 09, 2005 at 11:36:25AM -0800, thockin@hockin.org was heard to remark:
> On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote:
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers, 
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
> 
> Umm, references are implemented as pointers.  Instead of a "zoo of
> pointers" you have a "zoo of references".  No functional difference.

Sigh.

I think you are confusing references and pointers. By definition
you cannot "store a reference"; however, you can "dereference"
an object and store a pointer to it.

The C programming language conflates these two different ideas;
that is why they seem to be "the same thing" to you.

> > Minimizing pointers is good: less ref counting is needed,
> > fewer mallocs are needed, fewer locks are needed 
> > (because of local/private scope!!), and null pointer 
> > deref errors are less likely. 
> 
> Not true at all!  

Which part isn't true? 

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 19:20                                       ` linas
  2005-11-09 19:36                                         ` thockin
@ 2005-11-09 20:26                                         ` linux-os (Dick Johnson)
  2005-11-09 22:12                                           ` Vadim Lobanov
  2005-11-09 23:29                                           ` linas
  2005-11-09 21:43                                         ` Vadim Lobanov
  2 siblings, 2 replies; 131+ messages in thread
From: linux-os (Dick Johnson) @ 2005-11-09 20:26 UTC (permalink / raw)
  To: linas
  Cc: Vadim Lobanov, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev

On Wed, 9 Nov 2005, linas wrote:

> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
>> On Wed, 9 Nov 2005, J.A. Magallon wrote:
>>
>>> void do_some_stuff(T& arg1,T&  arg2)
>>
>> A diligent C programmer would write this as follows:
>> 	void do_some_stuff (struct T * a, struct T * b);
>> So I don't see C++ winning at all here.
>
> I guess the real point that I'd wanted to make, and seems
> to have gotten lost, was that by avoiding using pointers,
> you end up designing code in a very different way, and you
> can find out that often/usually, you don't need structs
> filled with a zoo of pointers.
>

But you can't avoid pointers unless you make your entire
program have global scope. That may be great for performance,
but a killer if for have any bugs.

Procedures that get pointers to variables (including structures)
are a way of isolating faults. Without them, you can't test
the procedures in a working environment.

Also, without pointers, you are severely limited on the kinds
of libraries you can share. You certainly wouldn't want
to compile an entire C runtime library into your code so
that all the buffers have local scope.

> Minimizing pointers is good: less ref counting is needed,
> fewer mallocs are needed, fewer locks are needed
> (because of local/private scope!!), and null pointer
> deref errors are less likely.
>

No. Minimizing pointers should not be an objective. Properly
using the components of your tool-set should be. This means
that you use the correct access mode for various object
types. "Correct" depends upon the instant context, not upon
some company or personal rule.

> There are even performance implications: on modern CPU's
> there's a very long pipeline to memory (hundreds of cycles
> for a cache miss! Really! Worse if you have run out of
> TLB entries!). So walking a long linked list chasing
> pointers can really really hurt performance.
>

Linked lists are some of the necessary elements when one
doesn't know ahead of time the number of objects that must
be manipulated. They are just programming tools. You use
them when they are necessary. The fact that they use pointers
to make the links is not relevant.

> By using refs instead of pointers, it helps you focus
> on the issue of "do I really need to store this pointer
> somewhere? Will I really need it later, or can I be done
> with it now?".
>

Huh? References (at the opcode level) are pointers. There
is no difference whatsoever. For memory references, you
Get:
 	direct, direct+displacement, register_indirect,
 	register_indirect+displacement.

There isn't anything else. Some processors let you sum
displacements over several registers. Nevertheless, that's
all you have. Accessing a variable by reference is an
old artifact of FORTRAN. It can be efficient if the
architecture is flat and global so the compiler can
substitute direct access. In other words, no parameter
or pointer actually gets passed to the routine. The
compiler just remembers what the parameters actually
were and substitutes code to directly access the
parameters.

Not so in C++. With C++, "reference" is a user-shorthand
where the compiler actually accesses variables with pointers.
The rules don't prohibit C++ compilers from using FORTRAN-
like conventions for passing-by-reference. It's just that
nobody seems to do so.

> I don't know if the idea of "using fewer pointers" can
> actually be carried out in the kernel. For starters,
> the stack is way too short to be able to put much on it.
>
> --linas
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 19:38                                           ` linas
@ 2005-11-09 20:39                                             ` thockin
  2005-11-09 21:53                                               ` Andreas Schwab
  2005-11-09 20:55                                             ` Matthew Wilcox
  1 sibling, 1 reply; 131+ messages in thread
From: thockin @ 2005-11-09 20:39 UTC (permalink / raw)
  To: linas
  Cc: Vadim Lobanov, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev

On Wed, Nov 09, 2005 at 01:38:28PM -0600, linas wrote:
> On Wed, Nov 09, 2005 at 11:36:25AM -0800, thockin@hockin.org was heard to remark:
> > Umm, references are implemented as pointers.  Instead of a "zoo of
> > pointers" you have a "zoo of references".  No functional difference.
> 
> Sigh.
> 
> I think you are confusing references and pointers. By definition
> you cannot "store a reference"; however, you can "dereference"
> an object and store a pointer to it.


Sigh, That's funny - I've written C++ code which has references as members
of objects.  You absolutely *can* store a reference.

References are simply a syntactic simplification to eliminate the
different pointer-dereference notation.  If they make you think about a
problem differently, that's fine, but they are really just pointers in
disguise.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 19:38                                           ` linas
  2005-11-09 20:39                                             ` thockin
@ 2005-11-09 20:55                                             ` Matthew Wilcox
  1 sibling, 0 replies; 131+ messages in thread
From: Matthew Wilcox @ 2005-11-09 20:55 UTC (permalink / raw)
  To: linas
  Cc: thockin, Vadim Lobanov, J.A. Magallon, Kyle Moffett,
	Douglas McNaught, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linux-pci, linuxppc64-dev

On Wed, Nov 09, 2005 at 01:38:28PM -0600, linas wrote:
> On Wed, Nov 09, 2005 at 11:36:25AM -0800, thockin@hockin.org was heard to remark:
> > On Wed, Nov 09, 2005 at 01:20:28PM -0600, linas wrote:

SHUT UP!  SHUT UP ALL OF YOU!!

Or at least stop cc'ing linux-pci on this stupid wanking.

Thanks.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 19:20                                       ` linas
  2005-11-09 19:36                                         ` thockin
  2005-11-09 20:26                                         ` linux-os (Dick Johnson)
@ 2005-11-09 21:43                                         ` Vadim Lobanov
  2005-11-10  0:27                                           ` linas
  2 siblings, 1 reply; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-09 21:43 UTC (permalink / raw)
  To: linas
  Cc: J.A. Magallon, Kyle Moffett, Douglas McNaught, Steven Rostedt,
	linux-kernel, bluesmoke-devel, linux-pci, linuxppc64-dev

On Wed, 9 Nov 2005, linas wrote:

> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> > On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >
> > > void do_some_stuff(T& arg1,T&  arg2)
> >
> > A diligent C programmer would write this as follows:
> > 	void do_some_stuff (struct T * a, struct T * b);
> > So I don't see C++ winning at all here.
>
> I guess the real point that I'd wanted to make, and seems
> to have gotten lost, was that by avoiding using pointers,
> you end up designing code in a very different way, and you
> can find out that often/usually, you don't need structs
> filled with a zoo of pointers.
>
> Minimizing pointers is good: less ref counting is needed,
> fewer mallocs are needed, fewer locks are needed
> (because of local/private scope!!), and null pointer
> deref errors are less likely.
>
> There are even performance implications: on modern CPU's
> there's a very long pipeline to memory (hundreds of cycles
> for a cache miss! Really! Worse if you have run out of
> TLB entries!). So walking a long linked list chasing
> pointers can really really hurt performance.
>
> By using refs instead of pointers, it helps you focus
> on the issue of "do I really need to store this pointer
> somewhere? Will I really need it later, or can I be done
> with it now?".
>
> I don't know if the idea of "using fewer pointers" can
> actually be carried out in the kernel. For starters,
> the stack is way too short to be able to put much on it.

I really see the two issues at hand as being very much orthogonal to
each other.

Namely, you put data on the stack when you need it in the local
'context' only, whereas you put data globally when it needs to be
available globally. The C++ references are nothing more than syntactic
sugar (and we all know what they say about that and semicolons) for
pointers, and so I don't see how they would affect the choices at all.
Choosing where the data goes should be done according to the data's
lifetime, not the specifics of how functions are declared.

</soapbox>

> --linas
>
>

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 20:39                                             ` thockin
@ 2005-11-09 21:53                                               ` Andreas Schwab
  2005-11-09 22:00                                                 ` Bernd Petrovitsch
  0 siblings, 1 reply; 131+ messages in thread
From: Andreas Schwab @ 2005-11-09 21:53 UTC (permalink / raw)
  To: thockin
  Cc: linas, Vadim Lobanov, J.A. Magallon, Kyle Moffett,
	Douglas McNaught, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linux-pci, linuxppc64-dev

thockin@hockin.org writes:

> Sigh, That's funny - I've written C++ code which has references as members
> of objects.  You absolutely *can* store a reference.

You can _initialize_, but not _modify_ (reseat) it.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 21:53                                               ` Andreas Schwab
@ 2005-11-09 22:00                                                 ` Bernd Petrovitsch
  0 siblings, 0 replies; 131+ messages in thread
From: Bernd Petrovitsch @ 2005-11-09 22:00 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: thockin, linas, Vadim Lobanov, J.A. Magallon, Kyle Moffett,
	Douglas McNaught, Steven Rostedt, linux-kernel, bluesmoke-devel,
	linuxppc64-dev

On Wed, 2005-11-09 at 22:53 +0100, Andreas Schwab wrote:
> thockin@hockin.org writes:
> 
> > Sigh, That's funny - I've written C++ code which has references as members
> > of objects.  You absolutely *can* store a reference.
> 
> You can _initialize_, but not _modify_ (reseat) it.
                                          reset?
As in:
----  snip  ----
struct x {
	struct y * const p;
};
----  snip  ----

We assume that no one casts the "const" away.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services




^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 20:26                                         ` linux-os (Dick Johnson)
@ 2005-11-09 22:12                                           ` Vadim Lobanov
  2005-11-09 22:37                                             ` linux-os (Dick Johnson)
                                                               ` (2 more replies)
  2005-11-09 23:29                                           ` linas
  1 sibling, 3 replies; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-09 22:12 UTC (permalink / raw)
  To: linux-os \(Dick Johnson\)
  Cc: linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev

On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:

>
> On Wed, 9 Nov 2005, linas wrote:
>
> > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> >> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >>
> >>> void do_some_stuff(T& arg1,T&  arg2)
> >>
> >> A diligent C programmer would write this as follows:
> >> 	void do_some_stuff (struct T * a, struct T * b);
> >> So I don't see C++ winning at all here.
> >
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
> >
>
> But you can't avoid pointers unless you make your entire
> program have global scope. That may be great for performance,
> but a killer if for have any bugs.

Just to extract some useful technical knowledge from the current ongoing
"flamewar"...
I'm not entirely sure if the above statement regarding performance is
correct. Some enlightenment would be appreciated.

Suppose you have the following code:
	int myvar;
	void foo (void) {
		printf("%d\n", myvar);
		bar();
		printf("%d\n", myvar);
	}
If bar is declared in _another_ file as
	void bar (void);
then I believe the compiler has to reread the global 'myvar' from memory
for the second printf().

However, if the code is as follows:
	void foo (void) {
		int myvar = 0;
		printf("%d\n", myvar);
		bar(&myvar);
		printf("%d\n", myvar);
	}
If bar is declared in _another_ file as
	void bar (const int * var);
then I think the compiler can validly cache the value of 'myvar' for the
second printf without re-reading it. Correct/incorrect?

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 22:12                                           ` Vadim Lobanov
@ 2005-11-09 22:37                                             ` linux-os (Dick Johnson)
  2005-11-09 22:47                                               ` Vadim Lobanov
  2005-11-09 22:54                                               ` typedefs and structs - trim request doug thompson
  2005-11-09 23:29                                             ` typedefs and structs Andreas Schwab
  2005-11-10  8:15                                             ` J.A. Magallon
  2 siblings, 2 replies; 131+ messages in thread
From: linux-os (Dick Johnson) @ 2005-11-09 22:37 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev


On Wed, 9 Nov 2005, Vadim Lobanov wrote:

> On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
>
>>
>> On Wed, 9 Nov 2005, linas wrote:
>>
>>> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
>>>> On Wed, 9 Nov 2005, J.A. Magallon wrote:
>>>>
>>>>> void do_some_stuff(T& arg1,T&  arg2)
>>>>
>>>> A diligent C programmer would write this as follows:
>>>> 	void do_some_stuff (struct T * a, struct T * b);
>>>> So I don't see C++ winning at all here.
>>>
>>> I guess the real point that I'd wanted to make, and seems
>>> to have gotten lost, was that by avoiding using pointers,
>>> you end up designing code in a very different way, and you
>>> can find out that often/usually, you don't need structs
>>> filled with a zoo of pointers.
>>>
>>
>> But you can't avoid pointers unless you make your entire
>> program have global scope. That may be great for performance,
>> but a killer if for have any bugs.
>
> Just to extract some useful technical knowledge from the current ongoing
> "flamewar"...
> I'm not entirely sure if the above statement regarding performance is
> correct. Some enlightenment would be appreciated.
>
> Suppose you have the following code:
> 	int myvar;
> 	void foo (void) {
> 		printf("%d\n", myvar);
> 		bar();
> 		printf("%d\n", myvar);
> 	}
> If bar is declared in _another_ file as
> 	void bar (void);
> then I believe the compiler has to reread the global 'myvar' from memory
> for the second printf().
>

Correct because bar() could have modified (it's global).

> However, if the code is as follows:
> 	void foo (void) {
> 		int myvar = 0;
> 		printf("%d\n", myvar);
> 		bar(&myvar);
> 		printf("%d\n", myvar);
> 	}
> If bar is declared in _another_ file as
> 	void bar (const int * var);
> then I think the compiler can validly cache the value of 'myvar' for the
> second printf without re-reading it. Correct/incorrect?
>

Maybe you tried to trick me by showing the variable was not going
to be changed (const *). In that case, the compiler may not re-read
the variable. However, it can re-read the variable.

A "smart" compiler might just do: write(1, "0\n", 2);
... for the first printf() as well. Such compilers make
debugging difficult.

> -Vadim Lobanov
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 22:37                                             ` linux-os (Dick Johnson)
@ 2005-11-09 22:47                                               ` Vadim Lobanov
  2005-11-09 22:54                                               ` typedefs and structs - trim request doug thompson
  1 sibling, 0 replies; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-09 22:47 UTC (permalink / raw)
  To: linux-os \(Dick Johnson\)
  Cc: linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linuxppc64-dev

Trimmed linux-pci so as not to annoy those who don't want to listen to
all of this. Anyone else who wants off the CC list should yell also. :-)

On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:

>
> On Wed, 9 Nov 2005, Vadim Lobanov wrote:
>
> > On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
> >
> >>
> >> On Wed, 9 Nov 2005, linas wrote:
> >>
> >>> On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> >>>> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >>>>
> >>>>> void do_some_stuff(T& arg1,T&  arg2)
> >>>>
> >>>> A diligent C programmer would write this as follows:
> >>>> 	void do_some_stuff (struct T * a, struct T * b);
> >>>> So I don't see C++ winning at all here.
> >>>
> >>> I guess the real point that I'd wanted to make, and seems
> >>> to have gotten lost, was that by avoiding using pointers,
> >>> you end up designing code in a very different way, and you
> >>> can find out that often/usually, you don't need structs
> >>> filled with a zoo of pointers.
> >>>
> >>
> >> But you can't avoid pointers unless you make your entire
> >> program have global scope. That may be great for performance,
> >> but a killer if for have any bugs.
> >
>
> Maybe you tried to trick me by showing the variable was not going
> to be changed (const *). In that case, the compiler may not re-read
> the variable. However, it can re-read the variable.
>
> A "smart" compiler might just do: write(1, "0\n", 2);
> ... for the first printf() as well. Such compilers make
> debugging difficult.

It wasn't meant to be a trick. In fact, I _want_ the compiler to cache
the myvar value in the second case -- I was merely wondering if there
was some fact that I overlooked that would prevent such an optimization.
The ultimate point of this was to show that globals can actually be
slower than locals (imagine the int in the example replaced by a
gigantic struct), contrary to the implication of your original
statement. :-)

>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
> Warning : 98.36% of all statistics are fiction.
> .
>
> ****************************************************************
> The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.
>
> Thank you.
>

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs - trim request
  2005-11-09 22:37                                             ` linux-os (Dick Johnson)
  2005-11-09 22:47                                               ` Vadim Lobanov
@ 2005-11-09 22:54                                               ` doug thompson
  1 sibling, 0 replies; 131+ messages in thread
From: doug thompson @ 2005-11-09 22:54 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Vadim Lobanov, linas, J.A. Magallon, Kyle Moffett,
	Douglas McNaught, Steven Rostedt, linux-kernel, linuxppc64-dev

Yes, trim off bluesmoke mailing list

thanks

doug thompson


> bluesmoke-devel mailing list
> bluesmoke-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bluesmoke-devel


^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 22:12                                           ` Vadim Lobanov
  2005-11-09 22:37                                             ` linux-os (Dick Johnson)
@ 2005-11-09 23:29                                             ` Andreas Schwab
  2005-11-09 23:40                                               ` Vadim Lobanov
  2005-11-10  8:15                                             ` J.A. Magallon
  2 siblings, 1 reply; 131+ messages in thread
From: Andreas Schwab @ 2005-11-09 23:29 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: linux-os \(Dick Johnson\),
	linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel

Vadim Lobanov <vlobanov@speakeasy.net> writes:

> However, if the code is as follows:
> 	void foo (void) {
> 		int myvar = 0;
> 		printf("%d\n", myvar);
> 		bar(&myvar);
> 		printf("%d\n", myvar);
> 	}
> If bar is declared in _another_ file as
> 	void bar (const int * var);
> then I think the compiler can validly cache the value of 'myvar' for the
> second printf without re-reading it. Correct/incorrect?

Incorrect. bar() may cast away const.  In C const does not mean readonly.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 20:26                                         ` linux-os (Dick Johnson)
  2005-11-09 22:12                                           ` Vadim Lobanov
@ 2005-11-09 23:29                                           ` linas
  1 sibling, 0 replies; 131+ messages in thread
From: linas @ 2005-11-09 23:29 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Vadim Lobanov, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel, bluesmoke-devel, linux-pci,
	linuxppc64-dev

On Wed, Nov 09, 2005 at 03:26:10PM -0500, linux-os (Dick Johnson) was heard to remark:
> 
> On Wed, 9 Nov 2005, linas wrote:
> 
> > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> >> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> >>
> >>> void do_some_stuff(T& arg1,T&  arg2)
> >>
> >> A diligent C programmer would write this as follows:
> >> 	void do_some_stuff (struct T * a, struct T * b);
> >> So I don't see C++ winning at all here.
> >
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
> 
> But you can't avoid pointers unless you make your entire
> program have global scope. 

I didn't say you can avoid all pointers. I did say that 
for many projects, one can often avoid many pointers.  
And I certainly did not say that one needs global scope 
to do so. In fact, I said the opposite. 

> Also, without pointers, you are severely limited on the kinds
> of libraries you can share. 

I think you don't understand what a reference is. 
A reference is just like a pointer, except that the 
signature is different.  It has nothing to do with the
ability to create or use libraries, or to create/use 
modular code.  

I was trying to say that by focusing on the concept
of a "reference" as opposed to the concept of a 
"pointer", you can write code that is *more* modular, 
not less.

> > Minimizing pointers is good: less ref counting is needed,
> > fewer mallocs are needed, fewer locks are needed
> > (because of local/private scope!!), and null pointer
> > deref errors are less likely.
> 
> No. Minimizing pointers should not be an objective. 

Why not?

I've fixed hundreds of kernel bugs (which you don't 
see on this list because my fixes mostly go to the 
distros or other users) and nine out of ten of these 
are null-pointer derefs. 

Maybe I'm naive for thinking that "fewer pointers == 
fewer pointer bugs" but, hey its worth a shot.

> Properly
> using the components of your tool-set should be. 

What tool set are you refering to?  I am assuming 
that the code is 100% malleable: that one has
complete authority to redesign the way the system 
works, from the ground up.  If you do not have this 
freedom, but are forced to use someone-else's tool set,
then yes, you are SOL.  Furthermore, I would agree 
that mixing two different styles of coding in one 
project can lead to some nasty, ugly code.

> > By using refs instead of pointers, it helps you focus
> > on the issue of "do I really need to store this pointer
> > somewhere? Will I really need it later, or can I be done
> > with it now?".
> 
> Huh? References (at the opcode level) are pointers. There
> is no difference whatsoever. 

Yes, that is right.  I'm not talking about opcodes.
That's not what the conversation is about.

What I am trying to say is that many people design 
code in such a way that they need to store lots of 
pointers in an assortment of structs.

I wanted to emphasize that there are other ways of
designing code, which has smaller needs for pointers
(and that this can be done without loosing modularity,
testability, debugability, and it can be done without 
resorting to global variables.)

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 23:29                                             ` typedefs and structs Andreas Schwab
@ 2005-11-09 23:40                                               ` Vadim Lobanov
  2005-11-10  3:39                                                 ` Steven Rostedt
  0 siblings, 1 reply; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-09 23:40 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: linux-os \\(Dick Johnson\\),
	linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	Steven Rostedt, linux-kernel

On Thu, 10 Nov 2005, Andreas Schwab wrote:

> Vadim Lobanov <vlobanov@speakeasy.net> writes:
>
> > However, if the code is as follows:
> > 	void foo (void) {
> > 		int myvar = 0;
> > 		printf("%d\n", myvar);
> > 		bar(&myvar);
> > 		printf("%d\n", myvar);
> > 	}
> > If bar is declared in _another_ file as
> > 	void bar (const int * var);
> > then I think the compiler can validly cache the value of 'myvar' for the
> > second printf without re-reading it. Correct/incorrect?
>
> Incorrect. bar() may cast away const.  In C const does not mean readonly.

In that case, I stand corrected.

Is there any real reason to apply const to pointer targets, aside from
giving yourself a warning in the case you try to write the pointer
target directly? Seems to be a missed opportunity for optimizations
where the coder designates that it's okay to do so.

> Andreas.
>
> --
> Andreas Schwab, SuSE Labs, schwab@suse.de
> SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
> PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 21:43                                         ` Vadim Lobanov
@ 2005-11-10  0:27                                           ` linas
  0 siblings, 0 replies; 131+ messages in thread
From: linas @ 2005-11-10  0:27 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: J.A. Magallon, Kyle Moffett, Douglas McNaught, Steven Rostedt,
	linux-kernel, linuxppc64-dev

On Wed, Nov 09, 2005 at 01:43:10PM -0800, Vadim Lobanov was heard to remark:
> On Wed, 9 Nov 2005, linas wrote:
> 
> > I guess the real point that I'd wanted to make, and seems
> > to have gotten lost, was that by avoiding using pointers,
> > you end up designing code in a very different way, and you
> > can find out that often/usually, you don't need structs
> > filled with a zoo of pointers.
> >
> > Minimizing pointers is good: less ref counting is needed,
> > fewer mallocs are needed, fewer locks are needed
> > (because of local/private scope!!), and null pointer
> > deref errors are less likely.
> >
> > There are even performance implications: on modern CPU's
> > there's a very long pipeline to memory (hundreds of cycles
> > for a cache miss! Really! Worse if you have run out of
> > TLB entries!). So walking a long linked list chasing
> > pointers can really really hurt performance.
> >
> > By using refs instead of pointers, it helps you focus
> > on the issue of "do I really need to store this pointer
> > somewhere? Will I really need it later, or can I be done
> > with it now?".
> >
> > I don't know if the idea of "using fewer pointers" can
> > actually be carried out in the kernel. For starters,
> > the stack is way too short to be able to put much on it.
> 
> I really see the two issues at hand as being very much orthogonal to
> each other.

Yes. I accidentally linked them, see below.

> Namely, you put data on the stack when you need it in the local
> 'context' only, whereas you put data globally when it needs to be
> available globally. 

Yes. But there's some flexibility.

> The C++ references are nothing more than syntactic
> sugar (and we all know what they say about that and semicolons) for
> pointers,

Yes.

> and so I don't see how they would affect the choices at all.
> Choosing where the data goes should be done according to the data's
> lifetime, not the specifics of how functions are declared.

My apologies for linking the idea of references to fewer pointers.
They're not linked, except in how I discovered them.

I once had a project (that used threads, so it was "kernel-like",
in that race conditions had to be dealt with).  One day, for the 
the hell of it, I decided to create a struct and keep it on the 
stack, instead of mallocing it.  Since this struct was accessed 
only by a few small, well-defined routines that did not keep any 
pointers to it, this worked just fine. And skipping the malloc/free
felt good, so I liked it.

Then I thought that maybe I could push the idea, see how far I 
could go. Well, of course, the code was filled with various 
objects, all of which *seemed* to be (or seemed to need to be) 
long-lifetime objects. And they all stored pointers to one-another,
since they all needed to get access to one-another at various points,
for various reasons. 

Well, I really wanted to alloc objects on stack, and so that forced
me to think about how to get rid of pointers (since a pointer to 
an object on stack is deadly). And that forced me to think about
lifetime. And some of this thinking was quite hard.  But encouraged
by some modest success at first, I found that I was able to 
eliminate almost all the pointers, and almost all the mallocs 
(maybe several dozen of each, scattered accross maybe several dozen
structs). And I was flabbergasted, since the resulting program
actually got smaller in the process, and faster. And the 
null-pointer derefs vanished. 

Now, maybe this was specific to the project, and can't be replicated
elsewhere.  But this was a communcations daemon: it basically 
was a pool of threads, each thread handling a long-lived,
stateful "session" of requests and responses from some remote server,
and so while its not the kernel, that's a reasonably complex thing. 

I'm not crazy enough to suggest that one could do the same thing 
in the Linux kernel, since one probably can't, but now that we're 
here and all, it does make me wonder.  FWIW, the two designs of
the commo daemon were radically different; things that were sliced 
one way got reworked to flow and be handled in a completely 
different order.  You can't just get rid of pointers with some 
trivial restructuring; you have to figure out how not to need them.

--linas

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 23:40                                               ` Vadim Lobanov
@ 2005-11-10  3:39                                                 ` Steven Rostedt
  2005-11-10  3:49                                                   ` Vadim Lobanov
  0 siblings, 1 reply; 131+ messages in thread
From: Steven Rostedt @ 2005-11-10  3:39 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: Andreas Schwab, linux-os \\(Dick Johnson\\),
	linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	linux-kernel

On Wed, 2005-11-09 at 15:40 -0800, Vadim Lobanov wrote:
> On Thu, 10 Nov 2005, Andreas Schwab wrote:
> 
> > Vadim Lobanov <vlobanov@speakeasy.net> writes:
> >
> > > However, if the code is as follows:
> > > 	void foo (void) {
> > > 		int myvar = 0;
> > > 		printf("%d\n", myvar);
> > > 		bar(&myvar);
> > > 		printf("%d\n", myvar);
> > > 	}
> > > If bar is declared in _another_ file as
> > > 	void bar (const int * var);
> > > then I think the compiler can validly cache the value of 'myvar' for the
> > > second printf without re-reading it. Correct/incorrect?
> >
> > Incorrect. bar() may cast away const.  In C const does not mean readonly.
> 
> In that case, I stand corrected.
> 
> Is there any real reason to apply const to pointer targets, aside from
> giving yourself a warning in the case you try to write the pointer
> target directly? Seems to be a missed opportunity for optimizations
> where the coder designates that it's okay to do so.

Actually, where are you going to cache it? In a register? but calling
bar() may use that register, so it would be stored on the stack anyway.
I doubt that this is a problem with the compiler, since if bar _is_
small, then myvar is most likely already in the processor's cache to
begin with, so it wouldn't need to go back out to memory, unless it was
modified.

-- Steve



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-10  3:39                                                 ` Steven Rostedt
@ 2005-11-10  3:49                                                   ` Vadim Lobanov
  0 siblings, 0 replies; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-10  3:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andreas Schwab, linux-os \\\\(Dick Johnson\\\\),
	linas, J.A. Magallon, Kyle Moffett, Douglas McNaught,
	linux-kernel

On Wed, 9 Nov 2005, Steven Rostedt wrote:

> On Wed, 2005-11-09 at 15:40 -0800, Vadim Lobanov wrote:
> > On Thu, 10 Nov 2005, Andreas Schwab wrote:
> >
> > > Vadim Lobanov <vlobanov@speakeasy.net> writes:
> > >
> > > > However, if the code is as follows:
> > > > 	void foo (void) {
> > > > 		int myvar = 0;
> > > > 		printf("%d\n", myvar);
> > > > 		bar(&myvar);
> > > > 		printf("%d\n", myvar);
> > > > 	}
> > > > If bar is declared in _another_ file as
> > > > 	void bar (const int * var);
> > > > then I think the compiler can validly cache the value of 'myvar' for the
> > > > second printf without re-reading it. Correct/incorrect?
> > >
> > > Incorrect. bar() may cast away const.  In C const does not mean readonly.
> >
> > In that case, I stand corrected.
> >
> > Is there any real reason to apply const to pointer targets, aside from
> > giving yourself a warning in the case you try to write the pointer
> > target directly? Seems to be a missed opportunity for optimizations
> > where the coder designates that it's okay to do so.
>
> Actually, where are you going to cache it? In a register? but calling
> bar() may use that register, so it would be stored on the stack anyway.

May, but not necessarily will.

> I doubt that this is a problem with the compiler, since if bar _is_
> small, then myvar is most likely already in the processor's cache to
> begin with, so it wouldn't need to go back out to memory, unless it was
> modified.

You're right, however. There's very few cases where such an optimization
would be useful, due to register constraints.

> -- Steve
>
>

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-09 22:12                                           ` Vadim Lobanov
  2005-11-09 22:37                                             ` linux-os (Dick Johnson)
  2005-11-09 23:29                                             ` typedefs and structs Andreas Schwab
@ 2005-11-10  8:15                                             ` J.A. Magallon
  2005-11-10 13:27                                               ` Nikita Danilov
  2 siblings, 1 reply; 131+ messages in thread
From: J.A. Magallon @ 2005-11-10  8:15 UTC (permalink / raw)
  To: Vadim Lobanov; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On Wed, 9 Nov 2005 14:12:38 -0800 (PST), Vadim Lobanov <vlobanov@speakeasy.net> wrote:

> On Wed, 9 Nov 2005, linux-os \(Dick Johnson\) wrote:
> 
> >
> > On Wed, 9 Nov 2005, linas wrote:
> >
> > > On Wed, Nov 09, 2005 at 08:22:15AM -0800, Vadim Lobanov was heard to remark:
> > >> On Wed, 9 Nov 2005, J.A. Magallon wrote:
> > >>
> > >>> void do_some_stuff(T& arg1,T&  arg2)
> > >>
> > >> A diligent C programmer would write this as follows:
> > >> 	void do_some_stuff (struct T * a, struct T * b);
> > >> So I don't see C++ winning at all here.
> > >
> > > I guess the real point that I'd wanted to make, and seems
> > > to have gotten lost, was that by avoiding using pointers,
> > > you end up designing code in a very different way, and you
> > > can find out that often/usually, you don't need structs
> > > filled with a zoo of pointers.
> > >
> >
> > But you can't avoid pointers unless you make your entire
> > program have global scope. That may be great for performance,
> > but a killer if for have any bugs.
> 
> Just to extract some useful technical knowledge from the current ongoing
> "flamewar"...
> I'm not entirely sure if the above statement regarding performance is
> correct. Some enlightenment would be appreciated.
> 
> Suppose you have the following code:
> 	int myvar;
> 	void foo (void) {
> 		printf("%d\n", myvar);
> 		bar();
> 		printf("%d\n", myvar);
> 	}
> If bar is declared in _another_ file as
> 	void bar (void);
> then I believe the compiler has to reread the global 'myvar' from memory
> for the second printf().
> 
> However, if the code is as follows:
> 	void foo (void) {
> 		int myvar = 0;
> 		printf("%d\n", myvar);
> 		bar(&myvar);
> 		printf("%d\n", myvar);
> 	}
> If bar is declared in _another_ file as
> 	void bar (const int * var);
> then I think the compiler can validly cache the value of 'myvar' for the
> second printf without re-reading it. Correct/incorrect?
> 

Nope. You can't trust bar() not doing something like

bar(const int* local_var)
{
   ... use local_var as ro...
   extern int myvar;
   myvar = 7;
}

For the compiler to do that, you must tag bar() with attribute(pure).

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-10  8:15                                             ` J.A. Magallon
@ 2005-11-10 13:27                                               ` Nikita Danilov
  2005-11-10 14:18                                                 ` linux-os (Dick Johnson)
  2005-11-10 19:21                                                 ` Kyle Moffett
  0 siblings, 2 replies; 131+ messages in thread
From: Nikita Danilov @ 2005-11-10 13:27 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: linux-kernel

J.A. Magallon writes:

[...]

 > > 
 > > However, if the code is as follows:
 > > 	void foo (void) {
 > > 		int myvar = 0;
 > > 		printf("%d\n", myvar);
 > > 		bar(&myvar);
 > > 		printf("%d\n", myvar);
 > > 	}
 > > If bar is declared in _another_ file as
 > > 	void bar (const int * var);
 > > then I think the compiler can validly cache the value of 'myvar' for the
 > > second printf without re-reading it. Correct/incorrect?
 > > 
 > 
 > Nope. You can't trust bar() not doing something like
 > 
 > bar(const int* local_var)
 > {
 >    ... use local_var as ro...
 >    extern int myvar;
 >    myvar = 7;
 > }
 > 
 > For the compiler to do that, you must tag bar() with attribute(pure).

extern declaration in your version of bar() cannot refer to the
automatic variable myvar in foo().

 > 
 > --
 > J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
 > werewolf!able!es                         \         It's better when it's free
 > Mandriva Linux release 2006.1 (Cooker) for i586
 > Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))

Nikita.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-10 13:27                                               ` Nikita Danilov
@ 2005-11-10 14:18                                                 ` linux-os (Dick Johnson)
  2005-11-10 19:21                                                 ` Kyle Moffett
  1 sibling, 0 replies; 131+ messages in thread
From: linux-os (Dick Johnson) @ 2005-11-10 14:18 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: J.A. Magallon, Linux kernel

On Thu, 10 Nov 2005, Nikita Danilov wrote:

> J.A. Magallon writes:
>
> [...]
>
> > >
> > > However, if the code is as follows:
> > > 	void foo (void) {
> > > 		int myvar = 0;
> > > 		printf("%d\n", myvar);
> > > 		bar(&myvar);
> > > 		printf("%d\n", myvar);
> > > 	}
> > > If bar is declared in _another_ file as
> > > 	void bar (const int * var);
> > > then I think the compiler can validly cache the value of 'myvar' for the
> > > second printf without re-reading it. Correct/incorrect?
> > >
> >
> > Nope. You can't trust bar() not doing something like
> >
> > bar(const int* local_var)
> > {
> >    ... use local_var as ro...
> >    extern int myvar;
> >    myvar = 7;
> > }
> >
> > For the compiler to do that, you must tag bar() with attribute(pure).
>
> extern declaration in your version of bar() cannot refer to the
> automatic variable myvar in foo().

Also, just because some called function may use casts to write
to a variable behind a void pointer doesn't mean that the function's
code must respect the potential of such buggy code.

The compiler must properly assume that the local variable, 'myvar'
will never be modified through the void pointer, even though some
jerk's buggy code may actually do that. Infortunately most protection
is page-based so there isn't any way for an OS to seg-fault somebody
who writes through void pointers.

The only hint that somebody did the wrong thing is that the wrong
value got printed. Because 'C' allows the use of casts, which
allows anything to be cast to anything else (may have to cheat by
first casting to void), the 'C' language not only allows you to
shoot yourself in the foot, but also hands you all the tools
necessary to do that, including loading and cocking the trigger.

Code inspection is sometimes necessary to avoid such problems.
In particular one must be on the lookout for 'excessive' casts.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-10 13:27                                               ` Nikita Danilov
  2005-11-10 14:18                                                 ` linux-os (Dick Johnson)
@ 2005-11-10 19:21                                                 ` Kyle Moffett
  2005-11-10 19:28                                                   ` Vadim Lobanov
  1 sibling, 1 reply; 131+ messages in thread
From: Kyle Moffett @ 2005-11-10 19:21 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: J.A. Magallon, linux-kernel

On Nov 10, 2005, at 08:27:18, Nikita Danilov wrote:
> extern declaration in your version of bar() cannot refer to the  
> automatic variable myvar in foo().

int foo;

void bar(const int *local_var) {
     foo = *local_var + 1;
}

void show(void) {
     printf("%d\n", foo);
     bar(&foo);
     printf("%d\n", foo);
}

If GCC thought it could arbitrarily cache anything it wanted to, then  
code like this would die.  There is a whole mess of code in GCC  
designed specifically to watch for and avoid aliasing issues like these.

Cheers,
Kyle Moffett

--
Debugging is twice as hard as writing the code in the first place.   
Therefore, if you write the code as cleverly as possible, you are, by  
definition, not smart enough to debug it.
   -- Brian Kernighan



^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-10 19:21                                                 ` Kyle Moffett
@ 2005-11-10 19:28                                                   ` Vadim Lobanov
  2005-11-10 20:53                                                     ` J.A. Magallon
  0 siblings, 1 reply; 131+ messages in thread
From: Vadim Lobanov @ 2005-11-10 19:28 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Nikita Danilov, J.A. Magallon, linux-kernel

On Thu, 10 Nov 2005, Kyle Moffett wrote:

> On Nov 10, 2005, at 08:27:18, Nikita Danilov wrote:
> > extern declaration in your version of bar() cannot refer to the
> > automatic variable myvar in foo().
>
> int foo;

Except foo is not an automatic variable within show(). When the variable
is global, then there's no argument. It was a question of when foo is
local and bar() would get a const pointer to the local.

> void bar(const int *local_var) {
>      foo = *local_var + 1;
> }
>
> void show(void) {
>      printf("%d\n", foo);
>      bar(&foo);
>      printf("%d\n", foo);
> }
>
> If GCC thought it could arbitrarily cache anything it wanted to, then
> code like this would die.  There is a whole mess of code in GCC
> designed specifically to watch for and avoid aliasing issues like these.
>
> Cheers,
> Kyle Moffett
>
> --
> Debugging is twice as hard as writing the code in the first place.
> Therefore, if you write the code as cleverly as possible, you are, by
> definition, not smart enough to debug it.
>    -- Brian Kernighan
>
>

-Vadim Lobanov

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs
  2005-11-10 19:28                                                   ` Vadim Lobanov
@ 2005-11-10 20:53                                                     ` J.A. Magallon
  0 siblings, 0 replies; 131+ messages in thread
From: J.A. Magallon @ 2005-11-10 20:53 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

On Thu, 10 Nov 2005 11:28:27 -0800 (PST), Vadim Lobanov <vlobanov@speakeasy.net> wrote:

> On Thu, 10 Nov 2005, Kyle Moffett wrote:
> 
> > On Nov 10, 2005, at 08:27:18, Nikita Danilov wrote:
> > > extern declaration in your version of bar() cannot refer to the
> > > automatic variable myvar in foo().
> >
> > int foo;
> 
> Except foo is not an automatic variable within show(). When the variable
> is global, then there's no argument. It was a question of when foo is
> local and bar() would get a const pointer to the local.
> 
> > void bar(const int *local_var) {
> >      foo = *local_var + 1;
> > }
> >
> > void show(void) {
> >      printf("%d\n", foo);
> >      bar(&foo);
> >      printf("%d\n", foo);
> > }
> >

You can tweak it all as you want:

int* foo_p;

void bar(const int *local_var) {
  *foo_p = *local_var + 1;
}

int main() {
  int foo;
  foo_p = &foo;
  printf("%d\n", foo);
  bar(&foo);
  printf("%d\n", foo);

  return 0;
}

The fact is that gcc can't suppose anything unless you say explicitly it can.
That is why 'pure' and 'const' __attributes__ exist, it supposes aliasing
by default and so on.

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-11-08  1:18                       ` Neil Brown
  2005-11-08 23:36                         ` typedefs and structs linas
@ 2005-12-16 13:09                         ` Denis Vlasenko
  2005-12-16 13:22                           ` Matthew Wilcox
  1 sibling, 1 reply; 131+ messages in thread
From: Denis Vlasenko @ 2005-12-16 13:09 UTC (permalink / raw)
  To: Neil Brown
  Cc: Steven Rostedt, linas, linux-kernel, bluesmoke-devel, linux-pci,
	johnrose, linuxppc64-dev, Paul Mackerras, Greg KH

On Tuesday 08 November 2005 03:18, Neil Brown wrote:
> On Monday November 7, rostedt@goodmis.org wrote:
> > 
> > This was for the simple reason, too many developers were passing
> > structures by value instead of by reference, just because they were
> > using a type that they didn't realize was a structure. And to make
> > things worse, these structures started to get bigger.
> > 
> 
> Another reason  for not using typedefs is that if you do, and you want
> to refer to the structure in some other include file, you have to
> #include the include file that devices the structure.
> If you don't use typedefs, you can just say:
> 
>    struct foo;

Forward decl for typedef works too:

typedef struct foo foo_t;

is ok even before struct foo is defined. Not sure that standards
allow thing, but gcc does.

> and the compiler will happily wait for the complete definition later
> (providing it doesn't need the size in the meanwhile). 
> So avoiding typedef means that you can sometimes avoid excess
> #includes, which means faster compiling.
--
vda

^ permalink raw reply	[flat|nested] 131+ messages in thread

* Re: typedefs and structs [was Re: [PATCH 16/42]: PCI:  PCI Error reporting callbacks]
  2005-12-16 13:09                         ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Denis Vlasenko
@ 2005-12-16 13:22                           ` Matthew Wilcox
  0 siblings, 0 replies; 131+ messages in thread
From: Matthew Wilcox @ 2005-12-16 13:22 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Neil Brown, Steven Rostedt, linas, linux-kernel, bluesmoke-devel,
	linux-pci, johnrose, linuxppc64-dev, Paul Mackerras, Greg KH

On Fri, Dec 16, 2005 at 03:09:01PM +0200, Denis Vlasenko wrote:
> 
> Forward decl for typedef works too:
> 
> typedef struct foo foo_t;
> 
> is ok even before struct foo is defined. Not sure that standards
> allow thing, but gcc does.

Forward declarations of typedefs don't work in at least one case that
do for struct definitions:

$ cat foo.c
typedef struct foo foo_t;
typedef struct foo foo_t;
$ gcc -Wall -o foo.o -c foo.c
foo.c:2: error: redefinition of typedef 'foo_t'
foo.c:1: error: previous declaration of 'foo_t' was here

and if you don't believe we do that, take another look at our headers
sometime.

^ permalink raw reply	[flat|nested] 131+ messages in thread

end of thread, other threads:[~2005-12-16 13:22 UTC | newest]

Thread overview: 131+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-03 23:59 [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Linas Vepstas
2005-11-04  0:42 ` [PATCH 1/42] ppc64: uniform usage of bus unit id interfaces linas
2005-11-04  0:47 ` [PATCH 2/42]: ppc64: misc minor cleanup Linas Vepstas
2005-11-04  0:48 ` [PATCH 3/42]: ppc64: PCI address cache minor fixes Linas Vepstas
2005-11-04  0:48 ` [PATCH 4/42]: ppc64: PCI error rate statistics Linas Vepstas
2005-11-04  0:49 ` [PATCH 5/42]: ppc64: RTAS error reporting restructuring Linas Vepstas
2005-11-04  0:49 ` [PATCH 6/42]: ppc64: avoid PCI error reporting for empty slots Linas Vepstas
2005-11-04  0:49 ` [PATCH 7/42]: ppc64: serialize reports of PCI errors Linas Vepstas
2005-11-04  0:49 ` [PATCH 8/42]: ppc64: escape hatch for spinning interrupt deadlocks Linas Vepstas
2005-11-04  0:49 ` [PATCH 9/42]: ppc64: bugfix: crash on PCI hotplug Linas Vepstas
2005-11-04  0:49 ` [PATCH 10/42]: ppc64: bugfix: don't silently gnore PCI errors Linas Vepstas
2005-11-04  0:49 ` [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 Linas Vepstas
2005-11-04  0:50 ` [PATCH 12/42]: ppc64: PCI error event dispatcher Linas Vepstas
2005-11-04  0:50 ` [PATCH 13/42]: ppc64: PCI reset support routines Linas Vepstas
2005-11-04  0:50 ` [PATCH 14/42]: ppc64: Save & restore of PCI device BARS Linas Vepstas
2005-11-04  0:50 ` [PATCH 15/42]: Documentation: PCI Error Recovery Linas Vepstas
2005-11-04  0:50 ` [PATCH 16/42]: PCI: PCI Error reporting callbacks Linas Vepstas
2005-11-05  6:11   ` Greg KH
2005-11-06 23:25     ` Paul Mackerras
2005-11-07 17:55       ` linas
2005-11-07 18:27         ` Greg KH
2005-11-07 18:56           ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] linas
2005-11-07 19:02             ` Greg KH
2005-11-07 19:36               ` linas
2005-11-07 20:02                 ` Greg KH
2005-11-07 20:41                   ` linas
2005-11-07 20:46                     ` Greg KH
2005-11-08  1:11                     ` Steven Rostedt
2005-11-08  1:18                       ` Neil Brown
2005-11-08 23:36                         ` typedefs and structs linas
2005-12-16 13:09                         ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Denis Vlasenko
2005-12-16 13:22                           ` Matthew Wilcox
2005-11-08 23:23                       ` typedefs and structs linas
2005-11-08 23:33                         ` Steven Rostedt
2005-11-09  9:22                           ` Bernd Petrovitsch
2005-11-08 23:57                         ` Kyle Moffett
2005-11-09  0:30                           ` linas
2005-11-09  0:37                             ` Douglas McNaught
2005-11-09  0:48                               ` linas
2005-11-09  0:59                                 ` Douglas McNaught
2005-11-09  2:14                                   ` Dmitry Torokhov
2005-11-09  1:51                                 ` Kyle Moffett
2005-11-09 10:16                                   ` J.A. Magallon
2005-11-09 16:22                                     ` Vadim Lobanov
2005-11-09 19:20                                       ` linas
2005-11-09 19:36                                         ` thockin
2005-11-09 19:38                                           ` linas
2005-11-09 20:39                                             ` thockin
2005-11-09 21:53                                               ` Andreas Schwab
2005-11-09 22:00                                                 ` Bernd Petrovitsch
2005-11-09 20:55                                             ` Matthew Wilcox
2005-11-09 20:26                                         ` linux-os (Dick Johnson)
2005-11-09 22:12                                           ` Vadim Lobanov
2005-11-09 22:37                                             ` linux-os (Dick Johnson)
2005-11-09 22:47                                               ` Vadim Lobanov
2005-11-09 22:54                                               ` typedefs and structs - trim request doug thompson
2005-11-09 23:29                                             ` typedefs and structs Andreas Schwab
2005-11-09 23:40                                               ` Vadim Lobanov
2005-11-10  3:39                                                 ` Steven Rostedt
2005-11-10  3:49                                                   ` Vadim Lobanov
2005-11-10  8:15                                             ` J.A. Magallon
2005-11-10 13:27                                               ` Nikita Danilov
2005-11-10 14:18                                                 ` linux-os (Dick Johnson)
2005-11-10 19:21                                                 ` Kyle Moffett
2005-11-10 19:28                                                   ` Vadim Lobanov
2005-11-10 20:53                                                     ` J.A. Magallon
2005-11-09 23:29                                           ` linas
2005-11-09 21:43                                         ` Vadim Lobanov
2005-11-10  0:27                                           ` linas
2005-11-08 23:57                         ` David Gibson
2005-11-09  0:13                           ` Zan Lynx
2005-11-09  0:42                             ` linas
2005-11-09  9:25                               ` Bernd Petrovitsch
2005-11-07 19:04             ` typedefs and structs [was Re: [PATCH 16/42]: PCI: PCI Error reporting callbacks] Randy.Dunlap
2005-11-07 19:57           ` [PATCH 1/7]: PCI revised [PATCH 16/42]: PCI: PCI Error reporting callbacks linas
2005-11-07 19:59             ` Christoph Hellwig
2005-11-07 20:03             ` Greg KH
2005-11-07 21:21               ` [PATCH 1/7]: PCI revised (2) " linas
2005-11-07 21:37                 ` Greg KH
2005-11-07 21:54                   ` Linus Torvalds
2005-11-07 22:54                     ` Greg KH
2005-11-07 22:43                   ` [PATCH 1/7]: PCI revised (3) " linas
2005-11-07 22:53                     ` Greg KH
2005-11-07 23:19                       ` linas
2005-11-08  2:43                         ` Greg KH
2005-11-07 21:30             ` [PATCH 2/7]: Revised [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver linas
2005-11-07 21:40               ` Brian King
2005-11-07 22:03                 ` linas
2005-11-07 21:31             ` [PATCH 3/7]: Revised [PATCH 28/42]: SCSI: add PCI error recovery to Symbios " linas
2005-11-07 21:34             ` [PATCH 4/7]: Revised [PATCH 29/42]: ethernet: add PCI error recovery to e100 " linas
2005-11-07 21:36             ` [PATCH: 5/7]: Revised: [PATCH 30/42]: ethernet: add PCI error recovery to e1000 " linas
2005-11-07 21:37             ` [PATCH 6/7]: Revised [PATCH 31/42]: ethernet: add PCI error recovery to ixgb " linas
2005-11-07 21:39             ` [PATCH 7/7]: Revised [PATCH 32/42]: RFC: Add compile-time config options linas
2005-11-04  0:50 ` [PATCH 17/42]: ppc64: mark failed devices Linas Vepstas
2005-11-04  0:51 ` [PATCH 18/42]: ppc64: bugfix: crash on dlpar slot add, remove Linas Vepstas
2005-11-04  0:51 ` [PATCH 19/42]: ppc64: bugfix: crash on PHB add Linas Vepstas
2005-11-04 16:20   ` John Rose
2005-11-04 16:35     ` linas
2005-11-04  0:51 ` [PATCH 20/42]: ppc64: PCI hotplug common code elimination Linas Vepstas
2005-11-04  0:51 ` [PATCH 21/42]: PCI: cleanup/simplify ppc64 PCI hotplug code Linas Vepstas
2005-11-04  0:52 ` [PATCH 22/42]: PCI: remove duplicted pci " Linas Vepstas
2005-11-04 21:54   ` John Rose
2005-11-04  0:52 ` [PATCH 23/42]: ppc64: migrate common PCI " Linas Vepstas
2005-11-04  0:52 ` [PATCH 24/42]: ppc64: PCI Error Recovery: PPC64 core recovery routines Linas Vepstas
2005-11-04  0:53 ` [PATCH 25/42]: ppc64: Split out PCI address cache to its own file Linas Vepstas
2005-11-04  0:53 ` [PATCH 26/42]: ppc64: Add "partion endpoint" support Linas Vepstas
2005-11-04  0:53 ` [PATCH 27/42]: SCSI: add PCI error recovery to IPR dev driver Linas Vepstas
2005-11-04  0:53 ` [PATCH 28/42]: SCSI: add PCI error recovery to Symbios " Linas Vepstas
2005-11-04  0:53 ` [PATCH 29/42]: ethernet: add PCI error recovery to e100 " Linas Vepstas
2005-11-04  1:34   ` Jesse Brandeburg
2005-11-04  1:51     ` Jesse Brandeburg
2005-11-04  0:54 ` [PATCH 30/42]: ethernet: add PCI error recovery to e1000 " Linas Vepstas
2005-11-04  0:54 ` [PATCH 31/42]: ethernet: add PCI error recovery to ixgb " Linas Vepstas
2005-11-04  0:54 ` [PATCH 32/42]: RFC: Add compile-time config options Linas Vepstas
2005-11-04  0:54 ` [PATCH 33/42]: ppc64: remove bogus printk Linas Vepstas
2005-11-04  0:54 ` [PATCH 34/42]: ppc64: Remove duplicate code Linas Vepstas
2005-11-04  0:54 ` [PATCH 35/42]: ppc64: bugfix: fill in un-initialzed field Linas Vepstas
2005-11-04  0:54 ` [PATCH 36/42]: ppc64: Use PE configuration address consistently Linas Vepstas
2005-11-04  0:54 ` [PATCH 37/42]: ppc64: set up the RTAS token just like the rest of them Linas Vepstas
2005-11-04  0:54 ` [PATCH 38/42]: ppc64: Don't continue with PCI Error recovery if slot reset failed Linas Vepstas
2005-11-04  0:55 ` [PATCH 39/42]: ppc64: handle multifunction PCI devices properly Linas Vepstas
2005-11-04  0:55 ` [PATCH 40/42]: ppc64: IOMMU: don't ioremap null pointers Linas Vepstas
2005-11-04  0:55 ` [PATCH 41/42]: ppc64: Save device BARS much earlier in the boot sequence Linas Vepstas
2005-11-04 22:14   ` linas
2005-11-04  0:55 ` [PATCH 42/42]: ppc64: get rid of per_cpu counters Linas Vepstas
2005-11-04  0:57 ` [PATCH 11/42]: ppc64: move code to powerpc directory from ppc64 Linas Vepstas
2005-11-04 22:14 ` [PATCH 0/42] PCI Error Recovery for PPC64 and misc device drivers Greg KH
2005-11-05  0:08   ` Paul Mackerras
2005-11-05  0:28     ` Greg KH
2005-11-05  0:46       ` Paul Mackerras
2005-11-05  1:28         ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).