From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S271718AbTG2NPv (ORCPT ); Tue, 29 Jul 2003 09:15:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S271719AbTG2NPv (ORCPT ); Tue, 29 Jul 2003 09:15:51 -0400 Received: from mta4.rcsntx.swbell.net ([151.164.30.28]:31213 "EHLO mta4.rcsntx.swbell.net") by vger.kernel.org with ESMTP id S271718AbTG2NPf (ORCPT ); Tue, 29 Jul 2003 09:15:35 -0400 Message-ID: <3F2673C4.9010302@pacbell.net> Date: Tue, 29 Jul 2003 06:16:52 -0700 From: David Brownell User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 X-Accept-Language: en-us, en, fr MIME-Version: 1.0 To: Pavel Machek CC: weissg@vienna.at, kernel list , linux-usb-devel@lists.sourceforge.net Subject: Re: [linux-usb-devel] OHCI problems with suspend/resume References: <20030723220805.GA278@elf.ucw.cz> In-Reply-To: <20030723220805.GA278@elf.ucw.cz> Content-Type: multipart/mixed; boundary="------------080601030404080107000309" Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------080601030404080107000309 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Pavel Machek wrote: > Hi! > > In 2.6.0-test1, OHCI is non-functional after first suspend/resume, and > kills machine during secon suspend/resume cycle. OK, I can see that in 2.6.0-test2 iff there's a device connected; that's on OHCI hardware that doesn't retain power during suspend, which means it uses the restart() path. Hardware that retains power depends on slightly different logic. > What happens is that ohci_irq gets ohci->hcca == NULL, and kills > machine. Why is ohci->hcca == NULL? ohci_stop was called from > hcd_panic() and freed ohci->hcca. Of course, the HC shouldn't have died and gone down those paths; but the "HC died" paths need to work right too. > I believe that we should > > 1) not free ohci->hcca so that system has better chance surviving > hcd_panic() More like not calling stop() from hcd_panic. Instead, all the devices should be disconnected, and their urbs cleaned up. That way the controller will sit in a known and "safe" state (reset) until the driver is shut down and gets stop()ped. I think that logic just "seemed to work" before, with subtle misbehaviors. We're still working to make sure that we do all the right stuff to shut down devices, no longer relying on USB device drivers to shut themselves down properly in their disconnect() methods. Many haven't, which can easily lead to oopsing on the shutdown paths that don't get used very regularly. Eventually I suspect that the HCD glue should grow logic to try restarting drivers after the hardware dies/resets, but first it's important to be sure they shut down properly. > 2) inform user when hcd panics. With a better diagnostic though. Here's a patch that makes things slightly better. It's still not fully functional yet -- I forgot how many FIXMEs are in those PM code paths! -- and shouldn't be merged as-is, but it works slightly better: - Has a more informative diagnostic message (which HC died); - When HC dies, mark the whole tree as unavailable so that new URB submissions using that HC will just fail; - Then hcd_panic() just disconnects all the devices, still keeping the root hub around. - OHCI-specific (should be generic, hcd-pci.c): don't try resuming a halted controller. Where "better" means that it seems functional after the first suspend/resume cycle, and re-enumerates the device that's connected ... but there's still strangeness. And I can see how some of it would be generic. - Dave --------------080601030404080107000309 Content-Type: text/plain; name="die.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="die.patch" --- 1.68/drivers/usb/core/hcd.c Tue Jul 15 09:47:16 2003 +++ edited/drivers/usb/core/hcd.c Sun Jul 27 19:42:46 2003 @@ -1487,8 +1488,26 @@ static void hcd_panic (void *_hcd) { - struct usb_hcd *hcd = _hcd; - hcd->driver->stop (hcd); + struct usb_hcd *hcd = _hcd; + struct usb_device *hub = hcd->self.root_hub; + + hub = usb_get_dev (hub); + usb_disconnect (&hub); + + /* FIXME either try to restart, or arrange to clean up the + * hc-internal state, like usb_hcd_pci_remove() does + */ +} + +void mark_gone (struct usb_device *dev) +{ + unsigned i; + + dev->state = USB_STATE_NOTATTACHED; + for (i = 0; i < dev->maxchild; i++) { + if (dev->children [i]) + mark_gone (dev->children [i]); + } } /** @@ -1501,29 +1520,12 @@ */ void usb_hc_died (struct usb_hcd *hcd) { - struct list_head *devlist, *urblist; - struct hcd_dev *dev; - struct urb *urb; - unsigned long flags; - - /* flag every pending urb as done */ - spin_lock_irqsave (&hcd_data_lock, flags); - list_for_each (devlist, &hcd->dev_list) { - dev = list_entry (devlist, struct hcd_dev, dev_list); - list_for_each (urblist, &dev->urb_list) { - urb = list_entry (urblist, struct urb, urb_list); - dev_dbg (hcd->controller, "shutdown %s urb %p pipe %x, current status %d\n", - hcd->self.bus_name, urb, urb->pipe, urb->status); - if (urb->status == -EINPROGRESS) - urb->status = -ESHUTDOWN; - } - } - urb = (struct urb *) hcd->rh_timer.data; - if (urb) - urb->status = -ESHUTDOWN; - spin_unlock_irqrestore (&hcd_data_lock, flags); + dev_err (hcd->controller, "HC died; pending I/O will be aborted.\n"); - /* hcd->stop() needs a task context */ + /* prevent new submissions to devices in this tree */ + mark_gone (hcd->self.root_hub); + + /* then usb_disconnect() them all, in a task context */ INIT_WORK (&hcd->work, hcd_panic, hcd); (void) schedule_work (&hcd->work); } --- 1.12/drivers/usb/host/ohci-pci.c Mon Apr 14 02:51:40 2003 +++ edited/drivers/usb/host/ohci-pci.c Sun Jul 27 18:51:21 2003 @@ -199,6 +199,11 @@ int retval = 0; unsigned long flags; + if (hcd->state == USB_STATE_HALT) { + ohci_dbg (ohci, "USB restart of halted device\n"); + return -EL3HLT; + } + #ifdef CONFIG_PMAC_PBOOK { struct device_node *of_node; --------------080601030404080107000309--