Pavel Machek wrote: > Hi! > > In 2.6.0-test1, OHCI is non-functional after first suspend/resume, and > kills machine during secon suspend/resume cycle. OK, I can see that in 2.6.0-test2 iff there's a device connected; that's on OHCI hardware that doesn't retain power during suspend, which means it uses the restart() path. Hardware that retains power depends on slightly different logic. > What happens is that ohci_irq gets ohci->hcca == NULL, and kills > machine. Why is ohci->hcca == NULL? ohci_stop was called from > hcd_panic() and freed ohci->hcca. Of course, the HC shouldn't have died and gone down those paths; but the "HC died" paths need to work right too. > I believe that we should > > 1) not free ohci->hcca so that system has better chance surviving > hcd_panic() More like not calling stop() from hcd_panic. Instead, all the devices should be disconnected, and their urbs cleaned up. That way the controller will sit in a known and "safe" state (reset) until the driver is shut down and gets stop()ped. I think that logic just "seemed to work" before, with subtle misbehaviors. We're still working to make sure that we do all the right stuff to shut down devices, no longer relying on USB device drivers to shut themselves down properly in their disconnect() methods. Many haven't, which can easily lead to oopsing on the shutdown paths that don't get used very regularly. Eventually I suspect that the HCD glue should grow logic to try restarting drivers after the hardware dies/resets, but first it's important to be sure they shut down properly. > 2) inform user when hcd panics. With a better diagnostic though. Here's a patch that makes things slightly better. It's still not fully functional yet -- I forgot how many FIXMEs are in those PM code paths! -- and shouldn't be merged as-is, but it works slightly better: - Has a more informative diagnostic message (which HC died); - When HC dies, mark the whole tree as unavailable so that new URB submissions using that HC will just fail; - Then hcd_panic() just disconnects all the devices, still keeping the root hub around. - OHCI-specific (should be generic, hcd-pci.c): don't try resuming a halted controller. Where "better" means that it seems functional after the first suspend/resume cycle, and re-enumerates the device that's connected ... but there's still strangeness. And I can see how some of it would be generic. - Dave