All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan-Marek Glogowski <glogow@fbihome.de>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>,
	linux-usb@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Nicolas Boichat <drinkcat@chromium.org>,
	Nicolas Saenz Julienne <nsaenzjulienne@suse.de>,
	Jon Flatley <jflat@chromium.org>,
	Oliver Neukum <oneukum@suse.com>
Subject: [v3] usb: warm-reset ports on hub resume, if requested
Date: Thu, 31 Jan 2019 19:52:39 +0100	[thread overview]
Message-ID: <C9929EEB-B09F-458B-9EC8-44198D76E80D@fbihome.de> (raw)

Am 31. Januar 2019 17:53:13 MEZ schrieb Alan Stern <stern@rowland.harvard.edu>:
>This thread has been hard to follow.  I'll try to answer some of your
>concerns below...
>
>On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote:
>
>> Am 31.01.19 um 15:56 schrieb Mathias Nyman:
>> > On 31.01.2019 12:42, Jan-Marek Glogowski wrote:
>> >> Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
>> >>> Am 30.01.19 um 15:58 schrieb Mathias Nyman:
>> >>>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
>> >>>>
>> >>>> The hub->event bits are set when roothub polling timer function
>gets data from
>> >>>> hub_status_data(), writes the data to a status urb, and returns
>the urb.
>
>You have been concentrating on what happens with root hubs 
>(understandable, since that's where your problem occurs).  But in order
>
>to understand the code, you have to realize that it is meant to apply 
>to _all_ hubs, both root and external.
>
>> >>>> Normal operation:
>> >>>> usb_hcd_poll_rh_status()        // roothub polling timer
>function
>> >>>>    hcd->driver->hub_status_data(hcd, buffer)   ->
>xhci_hub_status_data()
>> >>>>      xhci_hub_status_data()      //returns buffer with bits set
>if a port needs attention
>> >>>>    if (buffer && urb)
>> >>>>      memcpy(urb->transfer_buffer, buffer)       // copy buffer
>bits to urb
>> >>>>      usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb
>complete callback function
>> >>>>        hub_irq()
>> >>>>          hub->event_bits = urb->transfer_buffer //
>> >>>>    mod_timer(usb_hcd_poll_rh_status, future)    // reschedule
>roothub polling timer
>> >>>>
>> >>>> I'm guessing usb_hcd_poll_rh_status() never properly completes a
>URB in your case.
>
>It's not just a question of a timer expiring.  If we were dealing with 
>an external hub, the code would have to wait until an actual USB 
>transfer had completed.  There's no way to avoid that wait.
>
>> >>>> This could be possible if roothub is polled before properly
>resumed, then the URB wouldn't
>> >>>> be available yet, and no hub->event_bits are set.
>> >>>> Because no bits are set the hub starts suspending again after
>resume, probably before roothub
>> >>>> timer polls roothub again.
>> >>>> suspend fails when it manually checks
>hcd->driver->hub_status_data() for activity, and
>> >>>> it start resuming again, and continues this in a loop.
>> >>>>
>> >>>> Either roothub polling is stopped at the suspend attempt, or
>maybe continuously rescheduled
>> >>>> into the future because as a part of suspend attempt/resume
>loop.
>> >>
>> >> So as you expected the status_urb is always NULL.
>> >>
>> >> I tried to understand more of the code by adding some more debug
>message.
>> >>
>> >> I looked at usb_hcd_resume_root_hub, which is called when we see
>the "resume root hub" dmesg.
>> >> This just queues the hcd->wakeup_work, so indeed, if
>usb_hcd_poll_rh_status is later called,
>> >> the hub might not have resumed yet and polling starts. So my
>simple approach was to add a
>> >> function to call flush_work(&hcd->wakeup_work) before
>usb_hcd_poll_rh_status. All variants I
>> >> tested killed the kernel. I also tried to add a udelay to no
>avail.
>> > 
>> > I think the idea is that if usb_hcd_poll_rh_status() finds a port
>that needs attention, but
>> > the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set.
>> > 
>> > This flag is checked in resume when submitting the hub URB, and
>idea is to react on it:
>> > 
>> > hub_activate(HUB_RESUME)
>> >     usb_submit_urb(hub->urb)
>> >      rh_urb_enqueue()
>> >        rh_queue_status()
>> >           hcd->status_urb = urb;
>> >           if (HCD_POLL_PENDING(hcd)
>> >              mod_timer(&hcd->rh_timer, jiffies);
>> > 
>> > It tries to trigger usb_hcd_poll_rh_status() immediately by
>modifying the timer,
>> > but from your logs it seems that it's not fast enough:
>> > 
>> > [  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
>> > [  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0:
>get port status, actual port 0 status  = 0x2a0
>> >  ...
>> > [  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg
>0000 evt 0000
>> > [  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0:
>set port remote wake mask, actual port 0 status  = 0xe0002a0
>> > ...
>> > [  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
>> > [  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend,
>wakeup 1
>> > [  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced
>with wakeup event
>> > [  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
>> > [  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1,
>hcd->flags 0x18d, status_urb NULL
>> > 
>> > hcd->flags 0x18d means HCD_FLAG_POLL_PENDING  was set, but
>hub_event() workqueue and hub_suspend()
>> > are run before polling the root hub, and I guess the URB gets
>removed at some point during suspsend.
>> > 
>> > This then goes on in a loop.
>> > 
>> > You could try running usb_hcd_poll_rh_status() direcly instead of
>just kicking the timer and see if
>> > it works:
>> > 
>> > @@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd
>*hcd, struct urb *urb)
>> >                 mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) *
>(HZ/4));
>> >  
>> >         /* If a status change has already occurred, report it ASAP
>*/
>> > -       else if (HCD_POLL_PENDING(hcd))
>> > -               mod_timer(&hcd->rh_timer, jiffies);
>> > +       else if (HCD_POLL_PENDING(hcd)) {
>> > +               spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
>> > +               usb_hcd_poll_rh_status(hcd);
>> > +               return 0;
>> > +       }
>> >         retval = 0;
>> >   done:
>> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
>> > 
>> > 
>> > I didn't check locking or running contexts if it's ok to do this,
>there's could be a reason
>> > why it wasn't done like this in the first place.
>> > Worth a shot still.
>
>It's not a good idea.  Deferring the operation to a timer keeps things
>decoupled (running in different contexts with different stacks); doing
>a direct call could get into a loop of nested calls that would keep
>going deeper and deeper until it exhausted the kernel stack.
>
>Perhaps you didn't notice that at the end, hub_activate() calls
>kick_hub_wq().  That routine calls
>usb_autopm_get_interface_no_resume(), which will prevent the hub trim
>suspending until hub_event() calls usb_autopm_put_interface().  
>Therefore to make things work correctly, hub_activate() has to ensure
>that the hub->event_bits and hub->change_bits fields are set correctly,
>so that hub_event() will do the right thing when it runs (before the 
>hub is allowed to go back into suspend).
>
>Therefore if a port is in a funny state, hub_activate() should detect
>this and set one of these bit fields appropriately.  That's one reason
>why it contains all those tests for portstatus and portchange.  If a
>missing test needs to be added, that's what you should do.

That would be like the v3 of my patch, which started this thread and detected and handled the warm-reset request of the port in hub_activate() on resume. Maybe we can set some of these bits instead, or its already handled by the hub_port_reset() call; have to check.

I also included a patch in some previous mail after Mathias Nyman commented about the NULL status_urb, which added usb_hcd_poll_rh_status directly after the root hubs usb_remote_wakeup call.

A lot of stuff is deferred via workqueue or polled. As I understand it - in my case - resume, suspend and status race and miss each other in some endless loop.

As things are, I just have the new desktop HWs and a single usb-c device to test the ports of them. I have no idea what would happen with an additional non-root hub, but I can probably organize one, if I have to test it.

Jan-Marek

             reply	other threads:[~2019-01-31 18:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-31 18:52 Jan-Marek Glogowski [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-02-01 11:46 [v3] usb: warm-reset ports on hub resume, if requested Mathias Nyman
2019-01-31 21:11 Alan Stern
2019-01-31 16:53 Alan Stern
2019-01-31 15:13 Jan-Marek Glogowski
2019-01-31 14:56 Mathias Nyman
2019-01-31 10:42 Jan-Marek Glogowski
2019-01-30 16:56 Jan-Marek Glogowski
2019-01-30 14:58 Mathias Nyman
2019-01-30 12:38 Jan-Marek Glogowski
2019-01-25 16:21 Jan-Marek Glogowski
2019-01-25 15:14 Mathias Nyman
2019-01-18 11:28 Jan-Marek Glogowski
2019-01-16 17:07 Jan-Marek Glogowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C9929EEB-B09F-458B-9EC8-44198D76E80D@fbihome.de \
    --to=glogow@fbihome.de \
    --cc=drinkcat@chromium.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jflat@chromium.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=nsaenzjulienne@suse.de \
    --cc=oneukum@suse.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.