linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] readX_check() - Interface for PCI-X error recovery
@ 2004-04-06 11:04 Hidetoshi Seto
  2004-04-06 11:51 ` Matthew Wilcox
  2004-04-07  2:08 ` Grant Grundler
  0 siblings, 2 replies; 5+ messages in thread
From: Hidetoshi Seto @ 2004-04-06 11:04 UTC (permalink / raw)
  To: Linux Kernel mailing list, Linux IA64 Mailing List; +Cc: Hironobu Ishii

Hi, all,


This is a request for comments.

We are considering about PCI-X error recovery.

What we deal as a problem here is that there is some cases that 
a PCI error which is recoverable by a device driver is only passed to 
a kernel but not to the driver, and that it results a system down 
because the kernel cannot determine a relevant driver which should 
recover the error.
For example, in the case of PCI-X on ia64, PCI write error is recoverable 
because the error is recorded in a status register, and the proper driver 
can read the register after its transaction. However, PCI read error 
become a system down because the error cause a MCA(Machine Check Abort) 
interruption to the kernel instantly, but the kernel has no idea.


We have post our ideas for this recovery before now.
This time, we accede to Linus's idea and reconsider our interfaces.

 Seto: "[RFC] How drivers notice a HW error?" (readX_check() I/F)
   http://marc.theaimsgroup.com/?l=linux-kernel&m=106992207709400&w=2

 Ishii: "[RFC/PATCH, 1/4] readX_check() performance evaluation"
   http://marc.theaimsgroup.com/?l=linux-kernel&m=107525559029806&w=2

 Linus: "Re: [RFC/PATCH, 2/4] readX_check() performance evaluation"
   http://marc.theaimsgroup.com/?l=linux-kernel&m=107525904702681&w=2


We are supposing the use of these interfaces will be in some fundamental 
drivers such as for SCSI or NIC, and now we are working for implementation.

We would appreciate if you point us any problem of this or something you 
noticed about this.


Thanks,

H.Seto

-----

Abstract is as following:


- Resources newly required:

    on struct device:
	error flag
	list of recoverable physical address regions
	pointer to host bridge of the device

    on per_cpu:
	list of currently checking devices


- Interfaces newly required:

    clear_pcix_errors(dev)
	Clear the error flag of the dev, and start to check the device.
	This also clears the status register of its host bridge.

    readX_check(dev,vaddr)
	Read a register of the device mapped to vaddr, and check errors 
	if possible(This is depending on its architecture. In the case of 
	ia64, we can generate a MCA from an error by simple operation to 
	test the read data.)
	If any error happen on the recoverable region, set the error flag.

    read_pcix_errors(dev)
	Return whether here was an error or not referring the error flag in 
	struct device and the status register of the host bridge, and stop 
	checking of the device.

    register_pcix_region(dev,paddr,len)
    unregister_pcix_region(dev)
	Register/Unregister a physical address region that the driver can 
	recover. This function would be in the init/cleanup of the driver.


- Flow

  xx_probe(dev, ...)
  {
    ..
	register_pcix_region(dev,paddr,len);
    ..
  }

  xx_intr()
  {
	spin_lock_irqsave();		/* disable interrupt & preemption */

	clear_pcix_errors(dev);		/* clear error flag */
	{
	  ..
	    x = readX_check(dev, vaddr);
	  ..
	}
	error = read_pcix_errors(dev);	/* read error flag */
	if (error)
		recovery_or_offline(dev);

	spin_lock_irqrestore();		/* enable interrupt & preemption */
  }

  xx_remove()
  {
    ..
	unregister_pcix_region(dev) ;
    ..
  }


- Note

    on clear_pcix_errors():
	If the status of the host bridge indicates an error when it let be 
	clear, it should be cleared after setting each flags of all devices 
	under the bridge. This is for other clear-read block on other cpu.

    on ia64:
	We can generate a MCA from a poisoned data synchronously in 
	readX_check(), so it is required a MCA handler to check the error 
	and deal error flags in struct device.

    on ia32:
	Since ia32's MCE(Machine Check Exception) is asynchronous, it's 
	hard to check errors in readX_check(). So we check it by reading 
	the status register of host bridge in read_pcix_errors().

    on other arch:
	if it is not possible on the arch, do like:
	#ifndef CONFIG_PCI_RECOVERY
	  #define clear_pcix_errors(dev) do { } while (0)
	  #define read_pcix_errors(dev)  (0)
	#endif



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] readX_check() - Interface for PCI-X error recovery
  2004-04-06 11:04 [RFC] readX_check() - Interface for PCI-X error recovery Hidetoshi Seto
@ 2004-04-06 11:51 ` Matthew Wilcox
  2004-04-06 16:15   ` Jesse Barnes
  2004-04-07  9:34   ` Hidetoshi Seto
  2004-04-07  2:08 ` Grant Grundler
  1 sibling, 2 replies; 5+ messages in thread
From: Matthew Wilcox @ 2004-04-06 11:51 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Linux Kernel mailing list, Linux IA64 Mailing List, Hironobu Ishii

On Tue, Apr 06, 2004 at 08:04:49PM +0900, Hidetoshi Seto wrote:
> - Resources newly required:
> 
>     on struct device:
> 	error flag
> 	list of recoverable physical address regions

Can't you just use the pci_dev->resource regions for this?

> 	pointer to host bridge of the device

Again, there's already ways of getting to this from the pci_dev.
Simply wander up through pdev->bus->self until you get to a self that
is NULL and you've found a root bus.  Alternatively, you might just look
at PCI_CONTROLLER() on ia64.

>     on per_cpu:
> 	list of currently checking devices
> 
> 
> - Interfaces newly required:
> 
>     clear_pcix_errors(dev)
> 	Clear the error flag of the dev, and start to check the device.
> 	This also clears the status register of its host bridge.

For consistency, how about naming these functions pci_clear_errors()
and pci_check_errors()?  PCI-Express has similar error-checking abilities
and I'd hate to see two extremely similar capabilities at war with each
other over unacceoptable names ;-)

>     readX_check(dev,vaddr)
> 	Read a register of the device mapped to vaddr, and check errors 
> 	if possible(This is depending on its architecture. In the case of 
> 	ia64, we can generate a MCA from an error by simple operation to 
> 	test the read data.)
> 	If any error happen on the recoverable region, set the error flag.

I really don't think we want another readX variant.  Do we then also
add readX_check_relaxed()?  Can't we just pretend the MCA is asynchronous
on ia64?  I'm sure we'd get better performance.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] readX_check() - Interface for PCI-X error recovery
  2004-04-06 11:51 ` Matthew Wilcox
@ 2004-04-06 16:15   ` Jesse Barnes
  2004-04-07  9:34   ` Hidetoshi Seto
  1 sibling, 0 replies; 5+ messages in thread
From: Jesse Barnes @ 2004-04-06 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Matthew Wilcox, Hidetoshi Seto, Linux IA64 Mailing List, Hironobu Ishii

> >     readX_check(dev,vaddr)
> > 	Read a register of the device mapped to vaddr, and check errors
> > 	if possible(This is depending on its architecture. In the case of
> > 	ia64, we can generate a MCA from an error by simple operation to
> > 	test the read data.)
> > 	If any error happen on the recoverable region, set the error flag.
>
> I really don't think we want another readX variant.  Do we then also
> add readX_check_relaxed()?  Can't we just pretend the MCA is asynchronous
> on ia64?  I'm sure we'd get better performance.

Hmm.. I wonder if we could get away with not having a new readX interface by 
registering each PCI resource either at driver init time or in arch code with 
the MCA hander.  Then we could just make the read routines use the variable 
that was just read to try to flush out the MCA (there may be better ways to 
do this).  E.g.

arch_pci_scan()
{
	...
	for_each_pci_resource(dev, res) {
		check_region(res);
	}
	...
}

...

unsigned char readb(unsigned long addr)
{
	unsigned char val = *(volatile unsigned char *)addr;
#ifdef CONFIG_PCI_CHECK
	/* try to flush out the MCA by doing something with val */
#endif
	return val;
}

...

Then presumably the MCA error handler would see that an MCA occurred in a 
region registered during PCI initialization and return an error for 
pci_read_errors(dev);

Jesse

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] readX_check() - Interface for PCI-X error recovery
  2004-04-06 11:04 [RFC] readX_check() - Interface for PCI-X error recovery Hidetoshi Seto
  2004-04-06 11:51 ` Matthew Wilcox
@ 2004-04-07  2:08 ` Grant Grundler
  1 sibling, 0 replies; 5+ messages in thread
From: Grant Grundler @ 2004-04-07  2:08 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Linux Kernel mailing list, Linux IA64 Mailing List, Hironobu Ishii

On Tue, Apr 06, 2004 at 08:04:49PM +0900, Hidetoshi Seto wrote:
...
> - Resources newly required:
> 
>     on struct device:
> 	error flag
> 	list of recoverable physical address regions

I think willy already pointed out the struct pci_dev->resources[] should point 
to MMIO adress ranges the PCI(-X) device "owns".

> 	pointer to host bridge of the device

ditto. I think willy meant walk pci_dev->bus->parent until bus->self
is NULL normally should work.

>     on per_cpu:
> 	list of currently checking devices

Why wouldn't this be per PCI host bus controller instead of per CPU?

I'm thinking SMP - any CPU could access any MMIO region.
How those MMIO addresses get routed depends on how PCI Host Bus
controllers are programmed. Ie Bridges have to route MMIO transactions
to children (PCI devices).

> - Interfaces newly required:
> 
>     clear_pcix_errors(dev)
> 	Clear the error flag of the dev, and start to check the device.
> 	This also clears the status register of its host bridge.
> 
>     readX_check(dev,vaddr)
> 	Read a register of the device mapped to vaddr, and check errors 
> 	if possible(This is depending on its architecture. In the case of 
> 	ia64, we can generate a MCA from an error by simple operation to 
> 	test the read data.)
> 	If any error happen on the recoverable region, set the error flag.
>
>     read_pcix_errors(dev)
> 	Return whether here was an error or not referring the error flag in 
> 	struct device and the status register of the host bridge, and stop 
> 	checking of the device.
> 
>     register_pcix_region(dev,paddr,len)
>     unregister_pcix_region(dev)
> 	Register/Unregister a physical address region that the driver can 
> 	recover. This function would be in the init/cleanup of the driver.

Could register/unregister functionality be part of the clear/read errors?

> 
> 
> - Flow
> 
>   xx_probe(dev, ...)
>   {
>     ..
> 	register_pcix_region(dev,paddr,len);
>     ..
>   }
> 
>   xx_intr()
>   {
> 	spin_lock_irqsave();		/* disable interrupt & preemption */
> 
> 	clear_pcix_errors(dev);		/* clear error flag */
> 	{
> 	  ..
> 	    x = readX_check(dev, vaddr);
> 	  ..
> 	}
> 	error = read_pcix_errors(dev);	/* read error flag */
> 	if (error)
> 		recovery_or_offline(dev);
> 
> 	spin_lock_irqrestore();		/* enable interrupt & preemption */

Holding the spinlock might not be realistic for most drivers
during a recovery - ie re-init the HW. I understand this is a
generic example but it's a bit too simple...


...
>     on other arch:
> 	if it is not possible on the arch, do like:
> 	#ifndef CONFIG_PCI_RECOVERY
> 	  #define clear_pcix_errors(dev) do { } while (0)
> 	  #define read_pcix_errors(dev)  (0)
> 	#endif

And need similar for the register/unregister/readX_check() calls.

thanks,
grant

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] readX_check() - Interface for PCI-X error recovery
  2004-04-06 11:51 ` Matthew Wilcox
  2004-04-06 16:15   ` Jesse Barnes
@ 2004-04-07  9:34   ` Hidetoshi Seto
  1 sibling, 0 replies; 5+ messages in thread
From: Hidetoshi Seto @ 2004-04-07  9:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linux Kernel mailing list, Linux IA64 Mailing List, Hironobu Ishii

Matthew Wilcox <willy@debian.org> wrote:

> > 	list of recoverable physical address regions
> 
> Can't you just use the pci_dev->resource regions for this?

If there is a list separated usual one, driver can choice which region 
to check or not. However, in general, maybe almost all driver will regist 
all region it has.
Agree, I'll use the pci_dev->resource. This is a realistic approach.

> > 	pointer to host bridge of the device
> 
> Again, there's already ways of getting to this from the pci_dev.

I see, this wouldn't cost so much. I'll use.

> >     clear_pcix_errors(dev)
> 
> For consistency, how about naming these functions pci_clear_errors()
> and pci_check_errors()?  PCI-Express has similar error-checking abilities
> and I'd hate to see two extremely similar capabilities at war with each
> other over unacceoptable names ;-)

Sounds good. I think these naming are more better for peace porpose :-)

> >     readX_check(dev,vaddr)
> 
> I really don't think we want another readX variant.  Do we then also
> add readX_check_relaxed()?  Can't we just pretend the MCA is asynchronous
> on ia64?  I'm sure we'd get better performance.

On ia64, readX_check() will be used to make sure that there has no MCA before 
read_pcix_errors(). I think we could use original plain readX() in place of 
readX_check() if we ignore checking accuracy or if the driver takes care to 
all loaded data. But I don't know how about this on other architecture.

I think inside of readX_check() would be mostly arch specified codes.


Thanks,

H.Seto


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-04-07  9:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-06 11:04 [RFC] readX_check() - Interface for PCI-X error recovery Hidetoshi Seto
2004-04-06 11:51 ` Matthew Wilcox
2004-04-06 16:15   ` Jesse Barnes
2004-04-07  9:34   ` Hidetoshi Seto
2004-04-07  2:08 ` Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).