linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: ehci-hcd on CARDBUS hangs when stopping card service
       [not found] <20020523171326.GA11562@kroah.com>
@ 2002-05-23 22:32 ` David Brownell
  2002-05-24 18:49   ` Linus Torvalds
                     ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: David Brownell @ 2002-05-23 22:32 UTC (permalink / raw)
  To: Andrej.Borsenkow, linux-kernel

I've recently got some similar reports, but with a few less
facts ... it was clear to me there was a problem somewhere
in the CardBus code, since I know that cleanup via rmmod is
working fine, and in fact that's the workaround one person
had ended up with!


>>usb-ohci.c: bogus NDP=255 for OHCI usb - 09:00.1
>>(the above two statements are repeated ~4x's)

And the OHCI driver hits a related problem too ...


> IMHO sequence in cs driver should be reverted - it is not polite to remove
> hardware before giving driver a chance to cleanup :-)

Yes, absolutely.  It's turning a "clean shutdown" scenario into a
"dirty shutdown" ... a normal "rmmod" works, correctly, and from the
perspective of a device driver (if not the CardBus code) those should
be exactly the same:  two ways to start the same driver shutdown.

That current sequence (powerdown before pci_dev->remove) violates the
device tree sequencing requirement ... which I recall was one of the
key features of the original 2.4 CardBus support.  Did it change rather
recently, or has this bug really been lurking for a very long time?
I'd expect to have heard about that OHCI problem (seemingly the same root
cause) before, since there are folk using Cardbus OHCI (more using EHCI!),
but nobody's reported it that I know of.

I'll hope that problem appears only in 2.4.18-6mdk, and isn't found in
other kernels.  In particular, if it's in 2.5.17 then there's a big
hole in the "new driver model" work (struct device etc)!


 > 
	 Irrespectively,
> endless loop in ehci_stop does not look nice.

I partially agree.  For a clean shutdown, it's guaranteed not to be
endless.  For a dirty shutdown -- physically ejecting the card, or
the hardware having truly nasty failure mode (one I've not seen but
which could conceivably happen) -- it's a problem to fix.

Is there a clean way to detect the "card ejected before anything calls 
pci_dev->remove()" case?  I don't really like the idea of wrapping code
around every PCI register access to detect such cases.

- Dave




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-23 22:32 ` ehci-hcd on CARDBUS hangs when stopping card service David Brownell
@ 2002-05-24 18:49   ` Linus Torvalds
  2002-05-26 13:41     ` David Woodhouse
       [not found]   ` <200205241849.g4OInTe02393@penguin.transmeta.com>
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2002-05-24 18:49 UTC (permalink / raw)
  To: linux-kernel

In article <3CED6E0B.8020501@pacbell.net>,
David Brownell  <david-b@pacbell.net> wrote:
>
>Is there a clean way to detect the "card ejected before anything calls 
>pci_dev->remove()" case?  I don't really like the idea of wrapping code
>around every PCI register access to detect such cases.

You don't have much choice with CardBus, I'm afraid.

Even if the user were to do the rmmod "before" yanking out the card,
assuming that the rmmod took a bit of time and started the "remove()"
call at the same time the card was actually removed, you'll end up in
the same situation.

It's just a fact of life with any hot-plug thing that can be removed
without software first freeing it.

On most (practically all?) machines, a device that no longer exists will
return a nice floating 0xff for device reads, so it's usually reasonably
simple to detect (0xff is often not a legal status register value for
most devices for example). 

Also, it's generally a good idea to "just say no" to endless loops in
drivers. Hardware bugs _do_ happen, and it's a lot more pleasant to have
the driver do a

	printk("Device does not respond\n");

than for the kernel to hang.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
       [not found]   ` <200205241849.g4OInTe02393@penguin.transmeta.com>
@ 2002-05-25 17:15     ` David Brownell
  2002-05-25 17:54       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: David Brownell @ 2002-05-25 17:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

>>Is there a clean way to detect the "card ejected before anything calls 
>>pci_dev->remove()" case?  I don't really like the idea of wrapping code
>>around every PCI register access to detect such cases. 
> 
> You don't have much choice with CardBus, I'm afraid
> ...
> On most (practically all?) machines, a device that no longer exists will
> return a nice floating 0xff for device reads, so it's usually reasonably
> simple to detect (0xff is often not a legal status register value for
> most devices for example). 

Seems to me it'd be worth mentioning this issue somewhere in the
documentation or source.  One could get the impression that the
main issue for a CardBus-enabled PCI driver is to make sure that
the "new style" driver APIs -- with a DEVICE_TABLE etc -- are used.
(Maybe just a brief comment in <asm/io.h> ...)


> Also, it's generally a good idea to "just say no" to endless loops in
> drivers. 

I'm hardly averse to changing that loop (which normally does have an end :)
and I expected to need to at some point.  It's interesting to me just how
long that has been there without causing problems.  In this case the root
cause is that Cardbus "improper shutdown sequence" problem, so "no end"
is just a particularly nasty secondary failure mode.

- Dave



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-25 17:15     ` David Brownell
@ 2002-05-25 17:54       ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2002-05-25 17:54 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-kernel



On Sat, 25 May 2002, David Brownell wrote:
>
> Seems to me it'd be worth mentioning this issue somewhere in the
> documentation or source.  One could get the impression that the
> main issue for a CardBus-enabled PCI driver is to make sure that
> the "new style" driver APIs -- with a DEVICE_TABLE etc -- are used.
> (Maybe just a brief comment in <asm/io.h> ...)

Documentation might be a good thing, indeed. I doubt <asm/io.h> is the
right place for it, people tend to look at other drivers etc to pattern
after (as clearly showed by how often a bug in one place is replicated in
lots of other places ;)

We've actually fixed a number of these things - there are drivers that
notice removal on their own silently, and just turn it into a no-op.

> I'm hardly averse to changing that loop (which normally does have an end :)
> and I expected to need to at some point.  It's interesting to me just how
> long that has been there without causing problems.  In this case the root
> cause is that Cardbus "improper shutdown sequence" problem, so "no end"
> is just a particularly nasty secondary failure mode.

Most people don't unplug their devices while they are in use, and on
cardbus the most common case ends up (I think) being the fact that the
cardbus PCI static interrupt itself is shared with the device interrupt.
So what happens is that when you remove the CardBus card, that causes the
cardbus controller to send an interrupt (for removal), but since that
interrupt is shared with the card driver, the card driver also sees the
interrupt even if the card itself is otherwise idle.

This meant that some tulip cards at least were _guaranteed_ to lock up
some time ago, simply because their interrupt handler would loop forever
on seeing the "more work" bit continually (all bits in the status register
were set due to the removal and floating data lines).

		Linus


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-24 18:49   ` Linus Torvalds
@ 2002-05-26 13:41     ` David Woodhouse
  2002-05-28 13:54       ` Ingo Oeser
  0 siblings, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2002-05-26 13:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


torvalds@transmeta.com said:
> > Is there a clean way to detect the "card ejected before anything
> > calls pci_dev->remove()" case?  I don't really like the idea of
> > wrapping code around every PCI register access to detect such cases.

> You don't have much choice with CardBus, I'm afraid.

You get an interrupt _before_ the card goes away, because the pins are of
different lengths. 

As long as your driver API has some kind of abort() call to tell it that the
device is no longer present, and you manage to call that within the few
milliseconds between the card detect pin contact breaking and the rest of
the pins breaking, you should be fine.

If you're sharing interrupts and have high interrupt latency, there may be a
problem -- perhaps it would be better if in that case you could ensure that
the socket IRQ handler gets run _before_ the device IRQ handler. 

>  Also, it's generally a good idea to "just say no" to endless loops in
> drivers. Hardware bugs _do_ happen, and it's a lot more pleasant to
> have the driver do a
> 	printk("Device does not respond\n");
> than for the kernel to hang.

Too late. On some hardware, if you try to talk to the device once it's 
gone, you're already dead. Not all the world is a PeeCee.

--
dwmw2



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-23 22:32 ` ehci-hcd on CARDBUS hangs when stopping card service David Brownell
  2002-05-24 18:49   ` Linus Torvalds
       [not found]   ` <200205241849.g4OInTe02393@penguin.transmeta.com>
@ 2002-05-27  6:27   ` Borsenkow Andrej
  2002-05-27 18:37     ` David Brownell
  2002-05-27 10:54   ` Borsenkow Andrej
  3 siblings, 1 reply; 11+ messages in thread
From: Borsenkow Andrej @ 2002-05-27  6:27 UTC (permalink / raw)
  To: 'David Brownell', linux-kernel



[I should have mentioned I am not on lkml; as it stands now I reply to
thread off web archive]

It looks like discussion is focused on a problem how to detect a removed
card. While this problem definitely exists - I'd like to stress that the
original report was about a normal shutdown case. Just do init 0 with
card plugged in and system hangs. IMHO that should be dealt with in the
first place.

As for general case, I do not know hardware well enough. If as mentioned
some architectures have a general problem accessing removed devices then
CardBus driver should not call normal cleanup sequence in this case and
just signal abort condition to a low level driver (so that any in-memory
structures may be removed without an attempt to actually touch
hardware). 

Regards

-andrej

P.S. I would appreciate Cc to me. Thank you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-23 22:32 ` ehci-hcd on CARDBUS hangs when stopping card service David Brownell
                     ` (2 preceding siblings ...)
  2002-05-27  6:27   ` Borsenkow Andrej
@ 2002-05-27 10:54   ` Borsenkow Andrej
  3 siblings, 0 replies; 11+ messages in thread
From: Borsenkow Andrej @ 2002-05-27 10:54 UTC (permalink / raw)
  To: 'David Brownell', linux-kernel

> 
> I'll hope that problem appears only in 2.4.18-6mdk, and isn't found in
> other kernels.  In particular, if it's in 2.5.17 then there's a big
> hole in the "new driver model" work (struct device etc)!
> 

Absolutely the same code is in 2.4.18 and 2.4.19-pre8 so it is not
something Mandrake has introduced. The reason it has not been noticed
before is probably that not every driver hangs here (in reported case
ohci went through even though it could not access controller properly).

-andrej

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-27  6:27   ` Borsenkow Andrej
@ 2002-05-27 18:37     ` David Brownell
  0 siblings, 0 replies; 11+ messages in thread
From: David Brownell @ 2002-05-27 18:37 UTC (permalink / raw)
  To: Borsenkow Andrej; +Cc: linux-kernel

Borsenkow Andrej wrote:
> 
> It looks like discussion is focused on a problem how to detect a removed
> card.

Well, there are two issues.  That's a kind of "dirty shutdown" case, for
which there's a clear need to update the EHCI driver.  I'm working on
a fix for it, presumably someone can help test the patch.


 > While this problem definitely exists - I'd like to stress that the
> original report was about a normal shutdown case. Just do init 0 with
> card plugged in and system hangs. IMHO that should be dealt with in the
> first place.

... and that's the "clean shutdown" case, where it seems like the
CardBus code in the 2.4 kernels is clearly doing the wrong thing:
an operation sequence turns those into a dirty shutdown.  (Which
then trip over the EHCI "dirty shutdown" problem.)

It'd be nice if someone were to work on fixing that CardBus problem
too, since it's likely it'll break other drivers.  The fact that
nobody has stepped in there is why the discussion went the direction
it did ... :)

- Dave



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-26 13:41     ` David Woodhouse
@ 2002-05-28 13:54       ` Ingo Oeser
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Oeser @ 2002-05-28 13:54 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, linux-kernel

On Sun, May 26, 2002 at 02:41:30PM +0100, David Woodhouse wrote:
> torvalds@transmeta.com said:
> >  Also, it's generally a good idea to "just say no" to endless loops in
> > drivers. Hardware bugs _do_ happen, and it's a lot more pleasant to
> > have the driver do a
> > 	printk("Device does not respond\n");
> > than for the kernel to hang.
> 
> Too late. On some hardware, if you try to talk to the device once it's 
> gone, you're already dead. Not all the world is a PeeCee.

That happens even on a "PeeCee". 

Situation: 
   - Normal readl() of a ioremap()ed register set of a
     PCI device[1]:
   -> machine hangs hard (no magic sysrq possible, no Ooops, no
      panic printed) 
      
   - Several DWORD reads of the same register set succeded before
     and the register is wired according to the SPEC.


Just to provide an argument here

PS: I wish I had an PCI Analyzer available...

[1] TI TMS320C6x EVM Board[2], Rev. 3 
   with PCI-Bridge Chip AMCC S5933[3] hanging the host side on Busmaster DMA.

[2] PCI-ID: 0x104c:0x1002
[3] PCI-ID: 0x10e8:0x4750 or 0x10e8:0x807d or 0x10e8:0x809c
-- 
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ehci-hcd on CARDBUS hangs when stopping card service
  2002-05-23  6:33 Borsenkow Andrej
@ 2002-05-23  6:39 ` Borsenkow Andrej
  0 siblings, 0 replies; 11+ messages in thread
From: Borsenkow Andrej @ 2002-05-23  6:39 UTC (permalink / raw)
  To: linux-kernel

On Thu, 23 May 2002, Andrej Borsenkow wrote:

> There were several reports about Linux hanging on shutdown. In one case
> hardware was Dell Inspirion 8100 with Cardbus slot (Texas Instruments
> PCI4451) and Adaptec USB2connect (aua-1420) controller.
>

Sorry, was too fast to hit send. Kernel is 2.4.18-6mdk (based on
2.4.18)

-andrej

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ehci-hcd on CARDBUS hangs when stopping card service
@ 2002-05-23  6:33 Borsenkow Andrej
  2002-05-23  6:39 ` Borsenkow Andrej
  0 siblings, 1 reply; 11+ messages in thread
From: Borsenkow Andrej @ 2002-05-23  6:33 UTC (permalink / raw)
  To: linux-kernel

There were several reports about Linux hanging on shutdown. In one case
hardware was Dell Inspirion 8100 with Cardbus slot (Texas Instruments
PCI4451) and Adaptec USB2connect (aua-1420) controller.

Logs that were shown were somewhat crippled; in the above case log is:

> Although, I still get the following hang on shutdown:
>
> {HIGH TONE BEEP HERE}
> usb.c: USB disconnect on device
> usb-ohci.c: bogus NDP=255 for OHCI usb - 09:00.1
> (the above two statements are repeated ~4x's)
>
> usb-ohci.c: USB HC Takeover failed!
> usb.c: USB bus 3 deregistered
> usb.c: USB disconnect on device
> usb-ohci.c: USB HDC TakeOver failed!
> usb.c: USB bus 4 deregistered
> hcd.c: remove: 09:00.2, state 3
> usb.c: USB disconnect on device
>
> {TOTAL SYSTEM HANG HERE!}

It appears system hangs in ehci-hcd driver when removing USB tree. What I
suspect happens is:

cs driver (shutdwon_socket()) first powers down card slot and then calls
cb_free() that finally unresgisters USB. It appears that PCI reads from
USB controller in this state return just zeros (that explains bogus NDP
message). In which case it is quite probable that ehci-hcd just loops in
this place in ehci_stop:

        while (readl (&ehci->regs->status) & (STS_ASS | STS_PSS))
                udelay (100);

Unfortunately I do not have corr. hardware to test and guy who initially
reported it won't recompile kernel with more debugging on. So I thought I
just pass it on even if I cannot give a patch.

IMHO sequence in cs driver should be reverted - it is not polite to remove
hardware before giving driver a chance to cleanup :-) Irrespectively,
endless loop in ehci_stop does not look nice.

TIA

-andrej

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-05-28 14:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020523171326.GA11562@kroah.com>
2002-05-23 22:32 ` ehci-hcd on CARDBUS hangs when stopping card service David Brownell
2002-05-24 18:49   ` Linus Torvalds
2002-05-26 13:41     ` David Woodhouse
2002-05-28 13:54       ` Ingo Oeser
     [not found]   ` <200205241849.g4OInTe02393@penguin.transmeta.com>
2002-05-25 17:15     ` David Brownell
2002-05-25 17:54       ` Linus Torvalds
2002-05-27  6:27   ` Borsenkow Andrej
2002-05-27 18:37     ` David Brownell
2002-05-27 10:54   ` Borsenkow Andrej
2002-05-23  6:33 Borsenkow Andrej
2002-05-23  6:39 ` Borsenkow Andrej

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).