linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: de2104x: interrupts before interrupt handler is registered
       [not found] ` <5NnDE-44v-11@gated-at.bofh.it>
@ 2006-03-07  0:02   ` Robert Hancock
  2006-03-07 12:07     ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 32+ messages in thread
From: Robert Hancock @ 2006-03-07  0:02 UTC (permalink / raw)
  To: linux-kernel

linux-os (Dick Johnson) wrote:
> This started to happen in a lot of PCI drivers once it became
> necessary to call pci_enable_device() in order to make the
> returned IRQ values valid. This has been reported numerious
> times and has not been fixed. Basically, in order to get
> the correct value, one needs to disable the board in some
> unspecified way so it is not possible for it to generate
> an interrupt before enabling the board. With some devices
> this may not be possible!

What kind of board behaves that way? pci_enable_device just enables the 
device BARs and wakes it up if it was suspended, I should think that any 
device that starts generating interrupts from that must be quite broken..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07  0:02   ` de2104x: interrupts before interrupt handler is registered Robert Hancock
@ 2006-03-07 12:07     ` linux-os (Dick Johnson)
  2006-03-07 13:58       ` Robert Hancock
  0 siblings, 1 reply; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 12:07 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel


On Mon, 6 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> This started to happen in a lot of PCI drivers once it became
>> necessary to call pci_enable_device() in order to make the
>> returned IRQ values valid. This has been reported numerious
>> times and has not been fixed. Basically, in order to get
>> the correct value, one needs to disable the board in some
>> unspecified way so it is not possible for it to generate
>> an interrupt before enabling the board. With some devices
>> this may not be possible!
>
> What kind of board behaves that way? pci_enable_device just enables the
> device BARs and wakes it up if it was suspended, I should think that any
> device that starts generating interrupts from that must be quite broken..
>
> --
> Robert Hancock      Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
> Home Page: http://www.roberthancock.com/

No. It would be good if that was true. Unfortunately, the IRQ
returned before the pci_enable_device() is not correct. It
gets re-written by pci_enable_device().


Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 12:07     ` linux-os (Dick Johnson)
@ 2006-03-07 13:58       ` Robert Hancock
  2006-03-07 14:21         ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 32+ messages in thread
From: Robert Hancock @ 2006-03-07 13:58 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: linux-kernel

linux-os (Dick Johnson) wrote:
> No. It would be good if that was true. Unfortunately, the IRQ
> returned before the pci_enable_device() is not correct. It
> gets re-written by pci_enable_device().

That wasn't what I meant, yes, that is true in the current kernel. 
However, any device which would start generating interrupts just because 
  its BARs got enabled by pci_enable_device seems broken.

The driver needs to request the interrupt after the device is enabled, 
and only after that can it enable the device to generate interrupts.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 13:58       ` Robert Hancock
@ 2006-03-07 14:21         ` linux-os (Dick Johnson)
  2006-03-07 17:51           ` Bjorn Helgaas
  2006-03-08  0:00           ` Robert Hancock
  0 siblings, 2 replies; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 14:21 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel


On Tue, 7 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> No. It would be good if that was true. Unfortunately, the IRQ
>> returned before the pci_enable_device() is not correct. It
>> gets re-written by pci_enable_device().
>
> That wasn't what I meant, yes, that is true in the current kernel.
> However, any device which would start generating interrupts just because
>  its BARs got enabled by pci_enable_device seems broken.

Thinking that a device powers ON in a stable state is naive. Many
complex devices will have FPGA devices with floating pins that don't
become stable until their contents are loaded serially. Others will
have IRQ requests based upon power-on states that need to be cleared
with a software reset. One can't issue a software reset until the
device is enabled and enabling the device may generate interrupts
with no handler in place so you have a "can't get there from here"
problem. Linux-2.4.x had IRQs that were stable. One could put
a handler in place that would handle the possible burst of interrupts
upon startup. Then this was changed so the IRQ value is wrong
until an unrelated and illogical event occurs. Now, you need to
make work-arounds that were never before necessary. My request
to fix this fell upon deaf ears.

>
> The driver needs to request the interrupt after the device is enabled,
> and only after that can it enable the device to generate interrupts.
>
> --
> Robert Hancock      Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
> Home Page: http://www.roberthancock.com/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 14:21         ` linux-os (Dick Johnson)
@ 2006-03-07 17:51           ` Bjorn Helgaas
  2006-03-07 18:17             ` linux-os (Dick Johnson)
  2006-03-08  8:18             ` Jesse Brandeburg
  2006-03-08  0:00           ` Robert Hancock
  1 sibling, 2 replies; 32+ messages in thread
From: Bjorn Helgaas @ 2006-03-07 17:51 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Robert Hancock, linux-kernel

On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
> Thinking that a device powers ON in a stable state is naive. Many
> complex devices will have FPGA devices with floating pins that don't
> become stable until their contents are loaded serially. Others will
> have IRQ requests based upon power-on states that need to be cleared
> with a software reset. One can't issue a software reset until the
> device is enabled and enabling the device may generate interrupts
> with no handler in place so you have a "can't get there from here"
> problem.

Maybe you could handle this with a PCI quirk that runs before
pci_enable_device().  IIRC, we considered exposing a separate
interface for PCI IRQ allocation and routing, but decided it
wasn't worth the complexity since so few devices would need it.

> Linux-2.4.x had IRQs that were stable. One could put 
> a handler in place that would handle the possible burst of interrupts
> upon startup. Then this was changed so the IRQ value is wrong
> until an unrelated and illogical event occurs.

There are good reasons to wait to allocate the IRQ until you have
a driver that cares about the device.  I'm sorry that this broke
your specific case.

Bjorn

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 17:51           ` Bjorn Helgaas
@ 2006-03-07 18:17             ` linux-os (Dick Johnson)
  2006-03-08  0:05               ` Robert Hancock
  2006-03-08  8:18             ` Jesse Brandeburg
  1 sibling, 1 reply; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-07 18:17 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Robert Hancock, linux-kernel


On Tue, 7 Mar 2006, Bjorn Helgaas wrote:

> On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
>> Thinking that a device powers ON in a stable state is naive. Many
>> complex devices will have FPGA devices with floating pins that don't
>> become stable until their contents are loaded serially. Others will
>> have IRQ requests based upon power-on states that need to be cleared
>> with a software reset. One can't issue a software reset until the
>> device is enabled and enabling the device may generate interrupts
>> with no handler in place so you have a "can't get there from here"
>> problem.
>
> Maybe you could handle this with a PCI quirk that runs before
> pci_enable_device().  IIRC, we considered exposing a separate
> interface for PCI IRQ allocation and routing, but decided it
> wasn't worth the complexity since so few devices would need it.
>

The problem is that I can't write device internal registers to
put the device into a stable state without enabling the device.
So, the "fix" (read hack) was to mask off all possible interrupts
in the ioapic, call pci_enable_device(), initialize the device,
clear any pending hardware interrupts on the device, then reenable
the ioapic interrupts. I couldn't just use a spin-lock because
somebody complains and the machine panics.

>> Linux-2.4.x had IRQs that were stable. One could put
>> a handler in place that would handle the possible burst of interrupts
>> upon startup. Then this was changed so the IRQ value is wrong
>> until an unrelated and illogical event occurs.
>
> There are good reasons to wait to allocate the IRQ until you have
> a driver that cares about the device.  I'm sorry that this broke
> your specific case.
>
> Bjorn

There are now other "standard" boards that seem to be experiencing
the same problem. Maybe it is time to make a procedure that turns
off interrupts for a specific device (not an unknown IRQ). Then
a subsequent call turns them on after the handler is in place.
This wouldn't affect current drivers. They would still turn on
hot by default.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 14:21         ` linux-os (Dick Johnson)
  2006-03-07 17:51           ` Bjorn Helgaas
@ 2006-03-08  0:00           ` Robert Hancock
  2006-03-08 12:03             ` linux-os (Dick Johnson)
  1 sibling, 1 reply; 32+ messages in thread
From: Robert Hancock @ 2006-03-08  0:00 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: linux-kernel

linux-os (Dick Johnson) wrote:
> Thinking that a device powers ON in a stable state is naive.

I don't think so.. if you build a device that connects to the PCI bus it 
had better come up in a stable state if it wants to be compliant with 
the spec. That's what the reset line and power-up reset interval is for.

> Many
> complex devices will have FPGA devices with floating pins that don't
> become stable until their contents are loaded serially. Others will
> have IRQ requests based upon power-on states that need to be cleared
> with a software reset. One can't issue a software reset until the
> device is enabled and enabling the device may generate interrupts
> with no handler in place so you have a "can't get there from here"
> problem.

You still aren't seeing my point. Why does enabling the device BARs 
cause the device to generate interrupts? And if there's something you 
need to do to prevent the device from generating interrupts, how can you 
do it without enabling the device?

Also, the device's ISR must clear the condition which is causing the 
interrupt, otherwise interrupt storms will result. If your device can 
enter a state where the interrupt cannot be reliably cleared, how can 
you possibly comply with this?

> Linux-2.4.x had IRQs that were stable. One could put
> a handler in place that would handle the possible burst of interrupts
> upon startup. Then this was changed so the IRQ value is wrong
> until an unrelated and illogical event occurs. Now, you need to
> make work-arounds that were never before necessary. My request
> to fix this fell upon deaf ears.

I don't think any workarounds are needed except for devices that don't 
comply with the spec. Asserting interrupts that have not been 
specifically enabled by the driver would meet that definition in my 
view. If a device happens to do this then maybe a workaround would be 
needed, but that's what it would be, a workaround for a broken device.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 18:17             ` linux-os (Dick Johnson)
@ 2006-03-08  0:05               ` Robert Hancock
  0 siblings, 0 replies; 32+ messages in thread
From: Robert Hancock @ 2006-03-08  0:05 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Bjorn Helgaas, linux-kernel

linux-os (Dick Johnson) wrote:
> There are now other "standard" boards that seem to be experiencing
> the same problem. Maybe it is time to make a procedure that turns
> off interrupts for a specific device (not an unknown IRQ). Then
> a subsequent call turns them on after the handler is in place.
> This wouldn't affect current drivers. They would still turn on
> hot by default.

How do you propose to do this? There's no way to mask interrupts from 
just one device which is sharing an IRQ line, you have to mask 
interrupts from all of those devices. That would be quite ugly IMHO if 
one device could disable the interrupt used by another device for 
however long it felt like.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07 17:51           ` Bjorn Helgaas
  2006-03-07 18:17             ` linux-os (Dick Johnson)
@ 2006-03-08  8:18             ` Jesse Brandeburg
  2006-03-08 16:05               ` Bjorn Helgaas
  1 sibling, 1 reply; 32+ messages in thread
From: Jesse Brandeburg @ 2006-03-08  8:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-os (Dick Johnson), Robert Hancock, linux-kernel

On 3/7/06, Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
> Maybe you could handle this with a PCI quirk that runs before
> pci_enable_device().  IIRC, we considered exposing a separate
> interface for PCI IRQ allocation and routing, but decided it
> wasn't worth the complexity since so few devices would need it.
>
> > Linux-2.4.x had IRQs that were stable. One could put
> > a handler in place that would handle the possible burst of interrupts
> > upon startup. Then this was changed so the IRQ value is wrong
> > until an unrelated and illogical event occurs.
>
> There are good reasons to wait to allocate the IRQ until you have
> a driver that cares about the device.  I'm sorry that this broke
> your specific case.

FWIW, I'd be interested in following up on something like this in
another thread because e100 appears to have (at least in one
reporter's dual e100 machine) a similar "hardware problem" where a
shared interrupt line gets asserted too early and the kernel prints a
Nobody Cared message.

So we have a new way of doing things that exposes more broken
hardware, shouldn't we provide a way for that hardware to continue
working?

http://bugzilla.kernel.org/show_bug.cgi?id=5918

Jesse

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-08  0:00           ` Robert Hancock
@ 2006-03-08 12:03             ` linux-os (Dick Johnson)
  2006-03-08 23:34               ` Robert Hancock
  0 siblings, 1 reply; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-08 12:03 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel


On Tue, 7 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> Thinking that a device powers ON in a stable state is naive.
>
> I don't think so.. if you build a device that connects to the PCI bus it
> had better come up in a stable state if it wants to be compliant with
> the spec. That's what the reset line and power-up reset interval is for.
>
>> Many
>> complex devices will have FPGA devices with floating pins that don't
>> become stable until their contents are loaded serially. Others will
>> have IRQ requests based upon power-on states that need to be cleared
>> with a software reset. One can't issue a software reset until the
>> device is enabled and enabling the device may generate interrupts
>> with no handler in place so you have a "can't get there from here"
>> problem.
>
> You still aren't seeing my point. Why does enabling the device BARs
> cause the device to generate interrupts? And if there's something you
> need to do to prevent the device from generating interrupts, how can you
> do it without enabling the device?
>
> Also, the device's ISR must clear the condition which is causing the
> interrupt, otherwise interrupt storms will result. If your device can
> enter a state where the interrupt cannot be reliably cleared, how can
> you possibly comply with this?

You don't bother to read. The reported interrupt is WRONG, INVALID,
INCORRECT, BROKEN, until __after__ the device is enabled. That means
that one CANNOT put an interrupt handler in place before the
device is enabled.

It's the Linux code that was broken when 2.6.x started. Previous
Linux code never failed to report the correct IRQ.


>
>> Linux-2.4.x had IRQs that were stable. One could put
>> a handler in place that would handle the possible burst of interrupts
>> upon startup. Then this was changed so the IRQ value is wrong
>> until an unrelated and illogical event occurs. Now, you need to
>> make work-arounds that were never before necessary. My request
>> to fix this fell upon deaf ears.
>
> I don't think any workarounds are needed except for devices that don't
> comply with the spec. Asserting interrupts that have not been
> specifically enabled by the driver would meet that definition in my
> view. If a device happens to do this then maybe a workaround would be
> needed, but that's what it would be, a workaround for a broken device.
>
> --
> Robert Hancock      Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
> Home Page: http://www.roberthancock.com/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-08  8:18             ` Jesse Brandeburg
@ 2006-03-08 16:05               ` Bjorn Helgaas
  2006-03-08 19:34                 ` Martin Michlmayr
  0 siblings, 1 reply; 32+ messages in thread
From: Bjorn Helgaas @ 2006-03-08 16:05 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: linux-os (Dick Johnson), Robert Hancock, linux-kernel

On Wednesday 08 March 2006 01:18, Jesse Brandeburg wrote:
> On 3/7/06, Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> > On Tuesday 07 March 2006 07:21, linux-os (Dick Johnson) wrote:
> > Maybe you could handle this with a PCI quirk that runs before
> > pci_enable_device().  IIRC, we considered exposing a separate
> > interface for PCI IRQ allocation and routing, but decided it
> > wasn't worth the complexity since so few devices would need it.
> >
> > > Linux-2.4.x had IRQs that were stable. One could put
> > > a handler in place that would handle the possible burst of interrupts
> > > upon startup. Then this was changed so the IRQ value is wrong
> > > until an unrelated and illogical event occurs.
> >
> > There are good reasons to wait to allocate the IRQ until you have
> > a driver that cares about the device.  I'm sorry that this broke
> > your specific case.
> 
> FWIW, I'd be interested in following up on something like this in
> another thread because e100 appears to have (at least in one
> reporter's dual e100 machine) a similar "hardware problem" where a
> shared interrupt line gets asserted too early and the kernel prints a
> Nobody Cared message.
> 
> So we have a new way of doing things that exposes more broken
> hardware, shouldn't we provide a way for that hardware to continue
> working?

Booting with "pci=routeirq" gives the previous behavior.

It would be interesting to know whether that makes a difference
in the e100 issue you mention.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-08 16:05               ` Bjorn Helgaas
@ 2006-03-08 19:34                 ` Martin Michlmayr
  0 siblings, 0 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-08 19:34 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Jesse Brandeburg, linux-kernel

* Bjorn Helgaas <bjorn.helgaas@hp.com> [2006-03-08 09:05]:
> Booting with "pci=routeirq" gives the previous behavior.
> 
> It would be interesting to know whether that makes a difference
> in the e100 issue you mention.

FWIW, I'm pretty sure I tried this with de2104x and it didn't help.
I'm not positive though, but I could test again if people are
interested in the result.
-- 
Martin Michlmayr
tbm@cyrius.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-08 12:03             ` linux-os (Dick Johnson)
@ 2006-03-08 23:34               ` Robert Hancock
  2006-03-09 12:42                 ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 32+ messages in thread
From: Robert Hancock @ 2006-03-08 23:34 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: linux-kernel

linux-os (Dick Johnson) wrote:
> You don't bother to read. The reported interrupt is WRONG, INVALID,
> INCORRECT, BROKEN, until __after__ the device is enabled. That means
> that one CANNOT put an interrupt handler in place before the
> device is enabled.

And my point is, even if you COULD put an interrupt handler into place 
before enabling the device, if the device can be in an unstable state 
such that the interrupt can't be acknowledged reliably, how can you 
handle it without causing an interrupt storm?

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-08 23:34               ` Robert Hancock
@ 2006-03-09 12:42                 ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-09 12:42 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel


On Wed, 8 Mar 2006, Robert Hancock wrote:

> linux-os (Dick Johnson) wrote:
>> You don't bother to read. The reported interrupt is WRONG, INVALID,
>> INCORRECT, BROKEN, until __after__ the device is enabled. That means
>> that one CANNOT put an interrupt handler in place before the
>> device is enabled.
>
> And my point is, even if you COULD put an interrupt handler into place
> before enabling the device, if the device can be in an unstable state
> such that the interrupt can't be acknowledged reliably, how can you
> handle it without causing an interrupt storm?
>

Easy. Mask off the interrupts in the device. Software should
certainly "know" if the device has been initialized to a stable
state. Until it has been initialized, the ISR will simply
clear and mask the device.

> --
> Robert Hancock      Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.50 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
       [not found]       ` <5O23T-59S-15@gated-at.bofh.it>
@ 2006-03-09  0:02         ` Robert Hancock
  0 siblings, 0 replies; 32+ messages in thread
From: Robert Hancock @ 2006-03-09  0:02 UTC (permalink / raw)
  To: Jesse Brandeburg, linux-kernel

Jesse Brandeburg wrote:
> FWIW, I'd be interested in following up on something like this in
> another thread because e100 appears to have (at least in one
> reporter's dual e100 machine) a similar "hardware problem" where a
> shared interrupt line gets asserted too early and the kernel prints a
> Nobody Cared message.
> 
> So we have a new way of doing things that exposes more broken
> hardware, shouldn't we provide a way for that hardware to continue
> working?

I'm not sure this is at all related to the case we're talking about - it 
doesn't matter whether the request_irq or pci_enable_device comes first 
as the device is pulling on the interrupt line before the driver is even 
loaded. To fix that I'd think you'd need some kind of PCI quirk that 
would shut off the interrupt on the e100 card before any devices request 
the interrupt that it is sharing.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-08  0:15           ` Francois Romieu
@ 2006-03-08  3:22             ` Martin Michlmayr
  0 siblings, 0 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-08  3:22 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Francois Romieu <romieu@fr.zoreil.com> [2006-03-08 01:15]:
> netdev watchdog events appear in the dmesg of the patched driver.
> The driver survived it. So I'd say that the patch does its job.
> 
> OTOH, if you ever saw the unpatched driver survive this event, yell
> now.

No, I've never seen the unpatched driver survive.
-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07  5:11         ` Martin Michlmayr
  2006-03-07 14:57           ` Martin Michlmayr
@ 2006-03-08  0:15           ` Francois Romieu
  2006-03-08  3:22             ` Martin Michlmayr
  1 sibling, 1 reply; 32+ messages in thread
From: Francois Romieu @ 2006-03-08  0:15 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: netdev, linux-kernel

Martin Michlmayr <tbm@cyrius.com> :
[...]
> It seems to help.  It's hard to say for sure because I don't have a
> foolproof way to reproduce this panic.  It _usually_ occurs after
> copying a few hundred MB but there's no clear trigger.  I've now copied
> a few GB around using a kernel with your patch and it hasn't crashed.

netdev watchdog events appear in the dmesg of the patched driver.
The driver survived it. So I'd say that the patch does its job.

OTOH, if you ever saw the unpatched driver survive this event, yell now.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 19:17     ` Martin Michlmayr
  2006-03-06 19:48       ` Francois Romieu
  2006-03-06 21:17       ` Francois Romieu
@ 2006-03-07 15:16       ` Martin Michlmayr
  2 siblings, 0 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-07 15:16 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Martin Michlmayr <tbm@cyrius.com> [2006-03-06 19:17]:
> There's another interrupt related bug in the driver, though.  I

There's yet another bug (or two).

I just got another kernel panic:
http://www.cyrius.com/tmp/de2104x_panic2.jpg (which I haven't been
able to reproduce so far; this was without your latest patch applied,
btw).  This happened when I was doing DHCP while my server was not
responding to DHCP.  I wonder if it's related to another issue I've
observed.

This card is a D-Link DE 530 with both a BNC and RJ-45 connector.
When I boot my machine without having the Ethernet cable plugged in,
Linux thinks there's a BNC connection.  When I plug in the cable, the
link light on the card goes on but Linux doesn't seem to notice - in
fact, when I then start DHCP again, the link light goes off again and
Linux talks about BNC being up... [FWIW, Linux 2.4 doesn't handle this
situation either.  Under 2.4 the link light doesn't even come up.]


dmesg: booting without the RJ-45 cable plugged in, doing DHCP, then
plugging the RJ-45 cable in and doing DHCP again:

hda: 4999680 sectors (2559 MB) w/256KiB Cache, CHS=4960/16/63, UDMA(33)
 hda: hda1 hda2 < hda5 hda6 >
ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
de0: SROM leaf offset 30, default media 10baseT auto
de0:   media block #0: 10baseT-FD
de0:   media block #1: BNC
de0:   media block #2: 10baseT-HD
eth0: 21041 at 0xb8802000, 00:80:c8:33:4f:96, IRQ 10
Probing IDE interface ide1...
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
input: PC Speaker as /class/input/input1
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 136512k swap on /dev/hda5.  Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: set link BNC
eth0:    mode 0x7ffc0000, sia 0x10c4,0xffffef09,0xfffff7fd,0xffff0006
eth0:    set mode 0x7ffc0000, set sia 0xef09,0xf7fd,0x6
eth0: link up, media BNC
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
eth0: disabling interface
eth0: timeout expired stopping DMA
ACPI: PCI interrupt for device 0000:00:0b.0 disabled
eth0: enabling interface
eth0: set link BNC
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef09,0xfffff7fd,0xffff0006
eth0:    set mode 0x7ffc0040, set sia 0xef09,0xf7fd,0x6
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up, media BNC
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present


As a comparison, this happens when I boot with the RJ-45 cable plugged
in:

ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
de0: SROM leaf offset 30, default media 10baseT auto
de0:   media block #0: 10baseT-FD
de0:   media block #1: BNC
de0:   media block #2: 10baseT-HD
eth0: 21041 at 0xb8802000, 00:80:c8:33:4f:96, IRQ 10
Probing IDE interface ide1...
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
input: PC Speaker as /class/input/input1
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 136512k swap on /dev/hda5.  Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-07  5:11         ` Martin Michlmayr
@ 2006-03-07 14:57           ` Martin Michlmayr
  2006-03-08  0:15           ` Francois Romieu
  1 sibling, 0 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-07 14:57 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Martin Michlmayr <tbm@cyrius.com> [2006-03-07 05:11]:
> * Francois Romieu <romieu@fr.zoreil.com> [2006-03-06 22:17]:
> > Not sure about this one, but...
> 
> It seems to help.  It's hard to say for sure because I don't have a
> foolproof way to reproduce this panic.  It _usually_ occurs after
> copying a few hundred MB but there's no clear trigger.  I've now copied
> a few GB around using a kernel with your patch and it hasn't crashed.

I'm pretty sure now that your patch helps.  I left the system running
overnight and it was still alive in the morning after transferring ~10
GB.  I do get all kind of underrun messages (see below) but the data
got transferred alright.  I then rebooted with the kernel that doesn't
have your patch and it crashed after ~1 GB.


(this was at about 3 GB, but the same goes on and on; but the network
works.)

eth0      Link encap:Ethernet  HWaddr 00:80:C8:33:4F:96  
          inet addr:192.168.1.145  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::280:c8ff:fe33:4f96/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1199533 errors:7 dropped:0 overruns:7 frame:0
          TX packets:2344296 errors:396 dropped:252 overruns:396 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:64846004 (61.8 MiB)  TX bytes:3479989567 (3.2 GiB)
          Interrupt:10 Base address:0x2000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)


Adding 136512k swap on /dev/hda5.  Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb022
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb032
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb02a
eth0: tx err, status 0x7fffb01a
eth0: tx err, status 0x7fffb02a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 15/37/38
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 54 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 16/6/7
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 60 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 41/47/48
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 32 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 55 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 43 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 2 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 6 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 17/63/0
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 20:54           ` Francois Romieu
@ 2006-03-07  5:16             ` Martin Michlmayr
  0 siblings, 0 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-07  5:16 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Francois Romieu <romieu@fr.zoreil.com> [2006-03-06 21:54]:
> > By the way, I'm getting the following messages in dmesg:
> > eth0: tx err, status 0x7fffb002
> Tx underrun.
> Is there anything which could induce a noticeable load on the PCI bus ?

I was going to say "no" because I was simply copying some data via the
network.  However, it seems the situation is a bit more complicated
than this.  It seems that I only get these underruns using a specific
hard drive.  You see, the reason I'm rsyncing hundred of megabytes of
data across my LAN is because my laptop hard drive is dying, so I put
it in a PC as secondary master using an adapter.  Interestingly
enough, I don't get any Tx underruns when using a different disk.
Which is strange because at the moment the disk is working fine (it
sort of started dying but seems to behave right now), so I don't know
why it would change anything.  Maybe this makes sense to someone.

By the way, I only get underruns when I rsync from the PC to another
machine - not when I rsync from the other machine to the PC.
-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 21:17       ` Francois Romieu
@ 2006-03-07  5:11         ` Martin Michlmayr
  2006-03-07 14:57           ` Martin Michlmayr
  2006-03-08  0:15           ` Francois Romieu
  0 siblings, 2 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-07  5:11 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Francois Romieu <romieu@fr.zoreil.com> [2006-03-06 22:17]:
> Not sure about this one, but...

It seems to help.  It's hard to say for sure because I don't have a
foolproof way to reproduce this panic.  It _usually_ occurs after
copying a few hundred MB but there's no clear trigger.  I've now copied
a few GB around using a kernel with your patch and it hasn't crashed.
-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 19:17     ` Martin Michlmayr
  2006-03-06 19:48       ` Francois Romieu
@ 2006-03-06 21:17       ` Francois Romieu
  2006-03-07  5:11         ` Martin Michlmayr
  2006-03-07 15:16       ` Martin Michlmayr
  2 siblings, 1 reply; 32+ messages in thread
From: Francois Romieu @ 2006-03-06 21:17 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: netdev, linux-kernel

Martin Michlmayr <tbm@cyrius.com> :
[...]
> There's another interrupt related bug in the driver, though.  I
> sometimes get a kernel panic when rsycing several 100 megs of data
> across the LAN.  A picture showing the call trace can be found at
> http://www.cyrius.com/tmp/de2104x_panic.jpg

Not sure about this one, but...

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

diff --git a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c
index d7fb3ff..49235e2 100644
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1455,6 +1455,8 @@ static void de_tx_timeout (struct net_de
 	synchronize_irq(dev->irq);
 	de_clean_rings(de);
 
+	de_init_rings(de);
+
 	de_init_hw(de);
 	
 	netif_wake_queue(dev);


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 19:59         ` Martin Michlmayr
  2006-03-06 20:23           ` Francois Romieu
@ 2006-03-06 20:54           ` Francois Romieu
  2006-03-07  5:16             ` Martin Michlmayr
  1 sibling, 1 reply; 32+ messages in thread
From: Francois Romieu @ 2006-03-06 20:54 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: netdev, linux-kernel

Martin Michlmayr <tbm@cyrius.com> :
[...]
> By the way, I'm getting the following messages in dmesg:
> 
> eth0: tx err, status 0x7fffb002

Tx underrun.

Is there anything which could induce a noticeable load on the PCI bus ?

-- 
Ueimor


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 20:23           ` Francois Romieu
@ 2006-03-06 20:29             ` Martin Michlmayr
  0 siblings, 0 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-06 20:29 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Francois Romieu <romieu@fr.zoreil.com> [2006-03-06 21:23]:
> > http://www.cyrius.com/tmp/config-2.6.16-rc5-486
> > By the way, I'm getting the following messages in dmesg:
> netconsole appears enabled. Do you use it ?

It's a standard Debian kernel config so pretty much everything is
enabled as a module.  I didn't use netconsole.
-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 19:59         ` Martin Michlmayr
@ 2006-03-06 20:23           ` Francois Romieu
  2006-03-06 20:29             ` Martin Michlmayr
  2006-03-06 20:54           ` Francois Romieu
  1 sibling, 1 reply; 32+ messages in thread
From: Francois Romieu @ 2006-03-06 20:23 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: netdev, linux-kernel

Martin Michlmayr <tbm@cyrius.com> :
[...]
> http://www.cyrius.com/tmp/config-2.6.16-rc5-486
> 
> By the way, I'm getting the following messages in dmesg:

netconsole appears enabled. Do you use it ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 19:48       ` Francois Romieu
@ 2006-03-06 19:59         ` Martin Michlmayr
  2006-03-06 20:23           ` Francois Romieu
  2006-03-06 20:54           ` Francois Romieu
  0 siblings, 2 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-06 19:59 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Francois Romieu <romieu@fr.zoreil.com> [2006-03-06 20:48]:
> > There's another interrupt related bug in the driver, though.  I
> > sometimes get a kernel panic when rsycing several 100 megs of data
> > across the LAN.  A picture showing the call trace can be found at
> > http://www.cyrius.com/tmp/de2104x_panic.jpg
> Can you publish the .config ?

http://www.cyrius.com/tmp/config-2.6.16-rc5-486

By the way, I'm getting the following messages in dmesg:

eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb032
eth0: tx err, status 0x7fffb002

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 19:17     ` Martin Michlmayr
@ 2006-03-06 19:48       ` Francois Romieu
  2006-03-06 19:59         ` Martin Michlmayr
  2006-03-06 21:17       ` Francois Romieu
  2006-03-07 15:16       ` Martin Michlmayr
  2 siblings, 1 reply; 32+ messages in thread
From: Francois Romieu @ 2006-03-06 19:48 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: netdev, linux-kernel

Martin Michlmayr <tbm@cyrius.com> :
[...]
> There's another interrupt related bug in the driver, though.  I
> sometimes get a kernel panic when rsycing several 100 megs of data
> across the LAN.  A picture showing the call trace can be found at
> http://www.cyrius.com/tmp/de2104x_panic.jpg

Can you publish the .config ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-06 14:35   ` Martin Michlmayr
@ 2006-03-06 19:17     ` Martin Michlmayr
  2006-03-06 19:48       ` Francois Romieu
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-06 19:17 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Martin Michlmayr <tbm@cyrius.com> [2006-03-06 14:35]:
> Thanks a lot for your quick response, Francois.  I can confirm that
> this patch fixes the problem for me.

There's another interrupt related bug in the driver, though.  I
sometimes get a kernel panic when rsycing several 100 megs of data
across the LAN.  A picture showing the call trace can be found at
http://www.cyrius.com/tmp/de2104x_panic.jpg

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-05 18:59 ` Francois Romieu
@ 2006-03-06 14:35   ` Martin Michlmayr
  2006-03-06 19:17     ` Martin Michlmayr
  0 siblings, 1 reply; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-06 14:35 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

* Francois Romieu <romieu@fr.zoreil.com> [2006-03-05 19:59]:
> > I have a system on which I can reproduce this bug 100%.  While I have
> > no idea how to fix the issue, I can provide debugging information and
> > test a fix.

> (not compile-tested)

Thanks a lot for your quick response, Francois.  I can confirm that
this patch fixes the problem for me.

> -err_out_hw:
> -	spin_lock_irqsave(&de->lock, flags);
> -	de_stop_hw(de);
> -	spin_unlock_irqrestore(&de->lock, flags);

flags is no longer used now, so we get a compilation warning.  Updated
patch below.  Francois, can you please submit it with a proper
changelog entry and your Signed-off-by.


From: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>

--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1362,7 +1362,6 @@ static int de_open (struct net_device *d
 {
 	struct de_private *de = dev->priv;
 	int rc;
-	unsigned long flags;
 
 	if (netif_msg_ifup(de))
 		printk(KERN_DEBUG "%s: enabling interface\n", dev->name);
@@ -1376,18 +1375,20 @@ static int de_open (struct net_device *d
 		return rc;
 	}
 
-	rc = de_init_hw(de);
-	if (rc) {
-		printk(KERN_ERR "%s: h/w init failure, err=%d\n",
-		       dev->name, rc);
-		goto err_out_free;
-	}
+	dw32(IntrMask, 0);
 
 	rc = request_irq(dev->irq, de_interrupt, SA_SHIRQ, dev->name, dev);
 	if (rc) {
 		printk(KERN_ERR "%s: IRQ %d request failure, err=%d\n",
 		       dev->name, dev->irq, rc);
-		goto err_out_hw;
+		goto err_out_free;
+	}
+
+	rc = de_init_hw(de);
+	if (rc) {
+		printk(KERN_ERR "%s: h/w init failure, err=%d\n",
+		       dev->name, rc);
+		goto err_out_free_irq;
 	}
 
 	netif_start_queue(dev);
@@ -1395,11 +1396,8 @@ static int de_open (struct net_device *d
 
 	return 0;
 
-err_out_hw:
-	spin_lock_irqsave(&de->lock, flags);
-	de_stop_hw(de);
-	spin_unlock_irqrestore(&de->lock, flags);
-
+err_out_free_irq:
+	free_irq(dev->irq, dev);
 err_out_free:
 	de_free_rings(de);
 	return rc;

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-05 18:07 Martin Michlmayr
  2006-03-05 18:59 ` Francois Romieu
@ 2006-03-06 13:02 ` linux-os (Dick Johnson)
  1 sibling, 0 replies; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-03-06 13:02 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: Jeff Garzik, netdev, linux-kernel


On Sun, 5 Mar 2006, Martin Michlmayr wrote:

> We have three independent reports about problems with de2104x involving
> interrupts.  Alan Stern suggested that it "sure looks as though the
> ethernet interface is generating an interrupt request before the
> de2104x driver has registered its interrupt handler".
>
> The three reports are:
> - de2104x does not work (non-fatal oops) when uhci_hcd is loaded
>   first.  http://lkml.org/lkml/2006/2/3/402  The problem does not
>   occur under 2.4 with the tulip module, so this is a regression.
> - fatal de2104x interrupt oops (without uhci_hcd).
>   http://lkml.org/lkml/2006/2/5/64
> - "kernel panic after the first transmission attempt times out"
>   Regression from 2.4.  http://bugs.debian.org/288821
>
> I have a system on which I can reproduce this bug 100%.  While I have
> no idea how to fix the issue, I can provide debugging information and
> test a fix.  However, I'm (temporarily) leaving the country in three
> weeks and won't have access to this PC for several months, so it would
> be great if someone could look into this soon.  Jeff?
>
>
> 1)
> eth0: enabling interface
> eth0: set link 10baseT auto
> eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
> eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
> irq 10: nobody cared (try booting with the "irqpoll" option)
> [<c012f89e>] __report_bad_irq+0x31/0x73
> [<c012f96d>] note_interrupt+0x75/0x98
> [<c012f46a>] __do_IRQ+0x67/0x91
> [<c0104fc1>] do_IRQ+0x19/0x24
> [<c0103afa>] common_interrupt+0x1a/0x20
> [<c0119a1c>] __do_softirq+0x2c/0x7d
> [<c0119a8f>] do_softirq+0x22/0x26
> [<c0104fc6>] do_IRQ+0x1e/0x24
> [<c0103afa>] common_interrupt+0x1a/0x20
> [<c481da07>] de_set_rx_mode+0xf/0x12 [de2104x]
> [<c481e2c1>] de_init_hw+0x6d/0x76 [de2104x]
> [<c481e59e>] de_open+0x64/0xe4 [de2104x]
> [<c0225a5f>] dev_open+0x30/0x66
> [<c0226a9a>] dev_change_flags+0x4d/0xf0
> [<c025d301>] devinet_ioctl+0x224/0x4bd
> [<c0155541>] do_ioctl+0x21/0x50
> [<c0155774>] vfs_ioctl+0x152/0x161
> [<c01557cb>] sys_ioctl+0x48/0x65
> [<c0102a99>] syscall_call+0x7/0xb
> handlers:
> [<c4890d97>] (usb_hcd_irq+0x0/0x56 [usbcore])
> Disabling IRQ #10
>
> 3)
> eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
> eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
> [__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
> [note_interrupt+108/160] note_interrupt+0x6c/0xa0
> [do_IRQ+289/304] do_IRQ+0x121/0x130
> [common_interrupt+24/32] common_interrupt+0x18/0x20
> [__do_softirq+48/128] __do_softirq+0x30/0x80
> [acpi_irq+0/22] acpi_irq+0x0/0x16
> [do_softirq+38/48] do_softirq+0x26/0x30
> [do_IRQ+253/304] do_IRQ+0xfd/0x130
> [common_interrupt+24/32] common_interrupt+0x18/0x20
> [__crc_do_softirq+25311/208152] de_set_rx_mode+0x26/0x50 [de2104x]
> [__crc_do_softirq+28277/208152] de_init_hw+0x8c/0x90 [de2104x]
> [__crc_do_softirq+29105/208152] de_open+0x68/0x140 [de2104x]
> [profile_hook+45/75] profile_hook+0x2d/0x4b
> [dev_open+203/256] dev_open+0xcb/0x100
> [dev_mc_upload+36/80] dev_mc_upload+0x24/0x50
> [dev_change_flags+81/288] dev_change_flags+0x51/0x120
> [devinet_ioctl+582/1424] devinet_ioctl+0x246/0x590
> [inet_ioctl+94/160] inet_ioctl+0x5e/0xa0
> [sock_ioctl+249/688] sock_ioctl+0xf9/0x2b0
> [sys_ioctl+269/656] sys_ioctl+0x10d/0x290
> [syscall_call+7/11] syscall_call+0x7/0xb
> eth0: link up, media 10baseT auto
>
> --
> Martin Michlmayr
> http://www.cyrius.com/
> -

This started to happen in a lot of PCI drivers once it became
necessary to call pci_enable_device() in order to make the
returned IRQ values valid. This has been reported numerious
times and has not been fixed. Basically, in order to get
the correct value, one needs to disable the board in some
unspecified way so it is not possible for it to generate
an interrupt before enabling the board. With some devices
this may not be possible!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.47 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: de2104x: interrupts before interrupt handler is registered
  2006-03-05 18:07 Martin Michlmayr
@ 2006-03-05 18:59 ` Francois Romieu
  2006-03-06 14:35   ` Martin Michlmayr
  2006-03-06 13:02 ` linux-os (Dick Johnson)
  1 sibling, 1 reply; 32+ messages in thread
From: Francois Romieu @ 2006-03-05 18:59 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: Jeff Garzik, netdev, linux-kernel

Martin Michlmayr <tbm@cyrius.com> :
[...]
> I have a system on which I can reproduce this bug 100%.  While I have
> no idea how to fix the issue, I can provide debugging information and
> test a fix.  However, I'm (temporarily) leaving the country in three
> weeks and won't have access to this PC for several months, so it would
> be great if someone could look into this soon.  Jeff?

(not compile-tested)

diff --git a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c
index d7fb3ff..d16a5a0 100644
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1376,18 +1376,20 @@ static int de_open (struct net_device *d
 		return rc;
 	}
 
-	rc = de_init_hw(de);
-	if (rc) {
-		printk(KERN_ERR "%s: h/w init failure, err=%d\n",
-		       dev->name, rc);
-		goto err_out_free;
-	}
+	dw32(IntrMask, 0);
 
 	rc = request_irq(dev->irq, de_interrupt, SA_SHIRQ, dev->name, dev);
 	if (rc) {
 		printk(KERN_ERR "%s: IRQ %d request failure, err=%d\n",
 		       dev->name, dev->irq, rc);
-		goto err_out_hw;
+		goto err_out_free;
+	}
+
+	rc = de_init_hw(de);
+	if (rc) {
+		printk(KERN_ERR "%s: h/w init failure, err=%d\n",
+		       dev->name, rc);
+		goto err_out_free_irq;
 	}
 
 	netif_start_queue(dev);
@@ -1395,11 +1397,8 @@ static int de_open (struct net_device *d
 
 	return 0;
 
-err_out_hw:
-	spin_lock_irqsave(&de->lock, flags);
-	de_stop_hw(de);
-	spin_unlock_irqrestore(&de->lock, flags);
-
+err_out_free_irq:
+	free_irq(dev->irq, dev);
 err_out_free:
 	de_free_rings(de);
 	return rc;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* de2104x: interrupts before interrupt handler is registered
@ 2006-03-05 18:07 Martin Michlmayr
  2006-03-05 18:59 ` Francois Romieu
  2006-03-06 13:02 ` linux-os (Dick Johnson)
  0 siblings, 2 replies; 32+ messages in thread
From: Martin Michlmayr @ 2006-03-05 18:07 UTC (permalink / raw)
  To: Jeff Garzik, netdev; +Cc: linux-kernel

We have three independent reports about problems with de2104x involving
interrupts.  Alan Stern suggested that it "sure looks as though the
ethernet interface is generating an interrupt request before the
de2104x driver has registered its interrupt handler".

The three reports are:
 - de2104x does not work (non-fatal oops) when uhci_hcd is loaded
   first.  http://lkml.org/lkml/2006/2/3/402  The problem does not
   occur under 2.4 with the tulip module, so this is a regression.
 - fatal de2104x interrupt oops (without uhci_hcd).
   http://lkml.org/lkml/2006/2/5/64
 - "kernel panic after the first transmission attempt times out"
   Regression from 2.4.  http://bugs.debian.org/288821

I have a system on which I can reproduce this bug 100%.  While I have
no idea how to fix the issue, I can provide debugging information and
test a fix.  However, I'm (temporarily) leaving the country in three
weeks and won't have access to this PC for several months, so it would
be great if someone could look into this soon.  Jeff?


1)
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
irq 10: nobody cared (try booting with the "irqpoll" option)
 [<c012f89e>] __report_bad_irq+0x31/0x73
 [<c012f96d>] note_interrupt+0x75/0x98
 [<c012f46a>] __do_IRQ+0x67/0x91
 [<c0104fc1>] do_IRQ+0x19/0x24
 [<c0103afa>] common_interrupt+0x1a/0x20
 [<c0119a1c>] __do_softirq+0x2c/0x7d
 [<c0119a8f>] do_softirq+0x22/0x26
 [<c0104fc6>] do_IRQ+0x1e/0x24
 [<c0103afa>] common_interrupt+0x1a/0x20
 [<c481da07>] de_set_rx_mode+0xf/0x12 [de2104x]
 [<c481e2c1>] de_init_hw+0x6d/0x76 [de2104x]
 [<c481e59e>] de_open+0x64/0xe4 [de2104x]
 [<c0225a5f>] dev_open+0x30/0x66
 [<c0226a9a>] dev_change_flags+0x4d/0xf0
 [<c025d301>] devinet_ioctl+0x224/0x4bd
 [<c0155541>] do_ioctl+0x21/0x50
 [<c0155774>] vfs_ioctl+0x152/0x161
 [<c01557cb>] sys_ioctl+0x48/0x65
 [<c0102a99>] syscall_call+0x7/0xb
handlers:
[<c4890d97>] (usb_hcd_irq+0x0/0x56 [usbcore])
Disabling IRQ #10

3)
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
 [__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
 [note_interrupt+108/160] note_interrupt+0x6c/0xa0
 [do_IRQ+289/304] do_IRQ+0x121/0x130
 [common_interrupt+24/32] common_interrupt+0x18/0x20
 [__do_softirq+48/128] __do_softirq+0x30/0x80
 [acpi_irq+0/22] acpi_irq+0x0/0x16
 [do_softirq+38/48] do_softirq+0x26/0x30
 [do_IRQ+253/304] do_IRQ+0xfd/0x130
 [common_interrupt+24/32] common_interrupt+0x18/0x20
 [__crc_do_softirq+25311/208152] de_set_rx_mode+0x26/0x50 [de2104x]
 [__crc_do_softirq+28277/208152] de_init_hw+0x8c/0x90 [de2104x]
 [__crc_do_softirq+29105/208152] de_open+0x68/0x140 [de2104x]
 [profile_hook+45/75] profile_hook+0x2d/0x4b
 [dev_open+203/256] dev_open+0xcb/0x100
 [dev_mc_upload+36/80] dev_mc_upload+0x24/0x50
 [dev_change_flags+81/288] dev_change_flags+0x51/0x120
 [devinet_ioctl+582/1424] devinet_ioctl+0x246/0x590
 [inet_ioctl+94/160] inet_ioctl+0x5e/0xa0
 [sock_ioctl+249/688] sock_ioctl+0xf9/0x2b0
 [sys_ioctl+269/656] sys_ioctl+0x10d/0x290
 [syscall_call+7/11] syscall_call+0x7/0xb
eth0: link up, media 10baseT auto

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2006-03-09 12:42 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5N5Ql-30C-11@gated-at.bofh.it>
     [not found] ` <5NnDE-44v-11@gated-at.bofh.it>
2006-03-07  0:02   ` de2104x: interrupts before interrupt handler is registered Robert Hancock
2006-03-07 12:07     ` linux-os (Dick Johnson)
2006-03-07 13:58       ` Robert Hancock
2006-03-07 14:21         ` linux-os (Dick Johnson)
2006-03-07 17:51           ` Bjorn Helgaas
2006-03-07 18:17             ` linux-os (Dick Johnson)
2006-03-08  0:05               ` Robert Hancock
2006-03-08  8:18             ` Jesse Brandeburg
2006-03-08 16:05               ` Bjorn Helgaas
2006-03-08 19:34                 ` Martin Michlmayr
2006-03-08  0:00           ` Robert Hancock
2006-03-08 12:03             ` linux-os (Dick Johnson)
2006-03-08 23:34               ` Robert Hancock
2006-03-09 12:42                 ` linux-os (Dick Johnson)
     [not found] <5Nz1Y-4hZ-25@gated-at.bofh.it>
     [not found] ` <5NKTG-4F7-21@gated-at.bofh.it>
     [not found]   ` <5NLmp-5sk-5@gated-at.bofh.it>
     [not found]     ` <5NODG-1RH-3@gated-at.bofh.it>
     [not found]       ` <5O23T-59S-15@gated-at.bofh.it>
2006-03-09  0:02         ` Robert Hancock
2006-03-05 18:07 Martin Michlmayr
2006-03-05 18:59 ` Francois Romieu
2006-03-06 14:35   ` Martin Michlmayr
2006-03-06 19:17     ` Martin Michlmayr
2006-03-06 19:48       ` Francois Romieu
2006-03-06 19:59         ` Martin Michlmayr
2006-03-06 20:23           ` Francois Romieu
2006-03-06 20:29             ` Martin Michlmayr
2006-03-06 20:54           ` Francois Romieu
2006-03-07  5:16             ` Martin Michlmayr
2006-03-06 21:17       ` Francois Romieu
2006-03-07  5:11         ` Martin Michlmayr
2006-03-07 14:57           ` Martin Michlmayr
2006-03-08  0:15           ` Francois Romieu
2006-03-08  3:22             ` Martin Michlmayr
2006-03-07 15:16       ` Martin Michlmayr
2006-03-06 13:02 ` linux-os (Dick Johnson)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).