All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework
@ 2016-10-21 19:33 Michal Necasek
  2016-10-21 19:45 ` Thomas Gleixner
  2016-10-21 20:57 ` Alan Stern
  0 siblings, 2 replies; 5+ messages in thread
From: Michal Necasek @ 2016-10-21 19:33 UTC (permalink / raw)
  To: stern; +Cc: tglx, michael.thayer, knut.osmundsen, linux-kernel


   Alan,

 I'll get back to you on whether increasing the timeout helps, it'll take a bit of testing. Does it actually sound plausible that some controllers could not get things done in 250ms but could in 275ms?

 I will note that according to the table in <1>, a 250ms timeout with a HZ=250 kernel (what Ubuntu uses) falls into the 4ms granularity bucket, but a 275ms timeout goes into the 32ms granularity bucket. That could change things.

 And as a bit of background... the 250ms timeout should not be a problem in a virtualized environment under normal conditions. A ~10ms timeout is. What makes things harder is that the watchdog routine reads the frame counter from the HCCA, which has to be updated asynchronously. Reading from a HC register would actually be more expensive but much more reliable in this case. But again, 250ms should be plenty...

 Another factor is that the OHCI watchdog just kills the driver the first time there's a problem, there's no recovery attempt. So it's very noticeable when this happens.
 

    Regards,
      Michal


<1> http://lxr.free-electrons.com/source/kernel/time/timer.c

----- Original Message -----
From: stern@rowland.harvard.edu
To: michael.thayer@oracle.com
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, michal.necasek@oracle.com, knut.osmundsen@oracle.com
Sent: Friday, October 21, 2016 6:54:13 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
Subject: Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework

On Fri, 21 Oct 2016, Michael Thayer wrote:

> Hello Alan (LKML on CC),
> 
> Contacting you about this on Thomas Gleixner's (also on CC) suggestion. 
> The short summary is that when Linux 4.8.0 (also tested with a few later 
> kernels) is run on a VirtualBox virtual machine with USB enabled, OHCI 
> fails with the log messages "frame counter not updated; disabled" and 
> "HC died; cleaning up".  This seems to be due to the 250 ms interval 
> watchdog running with far too short intervals, which we think is a 
> consequence of the timer wheel code rework.  I will refer you to a bug 
> filed in Launchpad<1> for a longer description.
> 
> Hope this is of interest to you.
> 
> Regards,
> 
> Michael
> 
> <1> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1634737

That bug description says the watchdog timer routine can be called
twice in a 4-ms period, even though it requests a 250-ms delay.  Is
this really true?  If it is, it sounds like a real bug in the timer 
core.

Bryan Paluch reported a similar problem and said that increasing the
timeout value to 275 ms fixed it:

	http://marc.info/?l=linux-usb&m=147670889009451&w=2

Does that patch also fix the "frame counter not updating" problem?

Alan Stern

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework
  2016-10-21 19:33 PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework Michal Necasek
@ 2016-10-21 19:45 ` Thomas Gleixner
  2016-10-21 20:57 ` Alan Stern
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2016-10-21 19:45 UTC (permalink / raw)
  To: Michal Necasek; +Cc: stern, michael.thayer, knut.osmundsen, linux-kernel

On Fri, 21 Oct 2016, Michal Necasek wrote:

> I'll get back to you on whether increasing the timeout helps, it'll take
> a bit of testing. Does it actually sound plausible that some controllers
> could not get things done in 250ms but could in 275ms?

There is a bug in the new timer wheel code, which causes a timer to fire
way too early. That USB timeout is just exposing the issue.

We do not yet understand the issue completely, but we are working on it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework
  2016-10-21 19:33 PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework Michal Necasek
  2016-10-21 19:45 ` Thomas Gleixner
@ 2016-10-21 20:57 ` Alan Stern
  1 sibling, 0 replies; 5+ messages in thread
From: Alan Stern @ 2016-10-21 20:57 UTC (permalink / raw)
  To: Michal Necasek
  Cc: tglx, michael.thayer, knut.osmundsen, Bryan Paluch, USB list,
	Kernel development list

On Fri, 21 Oct 2016, Michal Necasek wrote:

>    Alan,
> 
>  I'll get back to you on whether increasing the timeout helps, it'll
> take a bit of testing. Does it actually sound plausible that some
> controllers could not get things done in 250ms but could in 275ms?

Not very, I admit.

>  I will note that according to the table in <1>, a 250ms timeout with
> a HZ=250 kernel (what Ubuntu uses) falls into the 4ms granularity
> bucket, but a 275ms timeout goes into the 32ms granularity bucket.
> That could change things.

Thomas has noted that there is definitely a bug in the new timer code.

>  And as a bit of background... the 250ms timeout should not be a
> problem in a virtualized environment under normal conditions. A ~10ms
> timeout is. What makes things harder is that the watchdog routine
> reads the frame counter from the HCCA, which has to be updated
> asynchronously. Reading from a HC register would actually be more
> expensive but much more reliable in this case. But again, 250ms
> should be plenty...
> 
>  Another factor is that the OHCI watchdog just kills the driver the
> first time there's a problem, there's no recovery attempt. So it's
> very noticeable when this happens.

I should think so!  The whole point of the watchdog is to catch bad
hardware when it starts misbehaving.  If that happens, we need to
cancel everything and kill the driver so that it doesn't take large
parts of the system down with it -- which is what used to happen.

Alan Stern

>     Regards,
>       Michal
> 
> 
> <1> http://lxr.free-electrons.com/source/kernel/time/timer.c


> ----- Original Message -----
> From: stern@rowland.harvard.edu
> To: michael.thayer@oracle.com
> Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, michal.necasek@oracle.com, knut.osmundsen@oracle.com
> Sent: Friday, October 21, 2016 6:54:13 PM GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
> Subject: Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework
> 
> On Fri, 21 Oct 2016, Michael Thayer wrote:
> 
> > Hello Alan (LKML on CC),
> > 
> > Contacting you about this on Thomas Gleixner's (also on CC) suggestion. 
> > The short summary is that when Linux 4.8.0 (also tested with a few later 
> > kernels) is run on a VirtualBox virtual machine with USB enabled, OHCI 
> > fails with the log messages "frame counter not updated; disabled" and 
> > "HC died; cleaning up".  This seems to be due to the 250 ms interval 
> > watchdog running with far too short intervals, which we think is a 
> > consequence of the timer wheel code rework.  I will refer you to a bug 
> > filed in Launchpad<1> for a longer description.
> > 
> > Hope this is of interest to you.
> > 
> > Regards,
> > 
> > Michael
> > 
> > <1> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1634737
> 
> That bug description says the watchdog timer routine can be called
> twice in a 4-ms period, even though it requests a 250-ms delay.  Is
> this really true?  If it is, it sounds like a real bug in the timer 
> core.
> 
> Bryan Paluch reported a similar problem and said that increasing the
> timeout value to 275 ms fixed it:
> 
> 	http://marc.info/?l=linux-usb&m=147670889009451&w=2
> 
> Does that patch also fix the "frame counter not updating" problem?
> 
> Alan Stern

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework
  2016-10-21 15:29 Michael Thayer
@ 2016-10-21 16:54 ` Alan Stern
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Stern @ 2016-10-21 16:54 UTC (permalink / raw)
  To: Michael Thayer
  Cc: linux-kernel, Thomas Gleixner, Michal Necasek, Knut St. Osmundsen

On Fri, 21 Oct 2016, Michael Thayer wrote:

> Hello Alan (LKML on CC),
> 
> Contacting you about this on Thomas Gleixner's (also on CC) suggestion. 
> The short summary is that when Linux 4.8.0 (also tested with a few later 
> kernels) is run on a VirtualBox virtual machine with USB enabled, OHCI 
> fails with the log messages "frame counter not updated; disabled" and 
> "HC died; cleaning up".  This seems to be due to the 250 ms interval 
> watchdog running with far too short intervals, which we think is a 
> consequence of the timer wheel code rework.  I will refer you to a bug 
> filed in Launchpad<1> for a longer description.
> 
> Hope this is of interest to you.
> 
> Regards,
> 
> Michael
> 
> <1> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1634737

That bug description says the watchdog timer routine can be called
twice in a 4-ms period, even though it requests a 250-ms delay.  Is
this really true?  If it is, it sounds like a real bug in the timer 
core.

Bryan Paluch reported a similar problem and said that increasing the
timeout value to 275 ms fixed it:

	http://marc.info/?l=linux-usb&m=147670889009451&w=2

Does that patch also fix the "frame counter not updating" problem?

Alan Stern

^ permalink raw reply	[flat|nested] 5+ messages in thread

* PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework
@ 2016-10-21 15:29 Michael Thayer
  2016-10-21 16:54 ` Alan Stern
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Thayer @ 2016-10-21 15:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, Thomas Gleixner, Michal Necasek, Knut St. Osmundsen

Hello Alan (LKML on CC),

Contacting you about this on Thomas Gleixner's (also on CC) suggestion. 
The short summary is that when Linux 4.8.0 (also tested with a few later 
kernels) is run on a VirtualBox virtual machine with USB enabled, OHCI 
fails with the log messages "frame counter not updated; disabled" and 
"HC died; cleaning up".  This seems to be due to the 250 ms interval 
watchdog running with far too short intervals, which we think is a 
consequence of the timer wheel code rework.  I will refer you to a bug 
filed in Launchpad<1> for a longer description.

Hope this is of interest to you.

Regards,

Michael

<1> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1634737
-- 
Michael Thayer | VirtualBox engineer
ORACLE Deutschland B.V. & Co. KG | Werkstr. 24 | D-71384 Weinstadt

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande Handelsregister 
der Handelskammer Midden-Nederland, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-21 20:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 19:33 PROBLEM: OHCI watchdog timeouts inside VirtualBox, probably due to timer wheel rework Michal Necasek
2016-10-21 19:45 ` Thomas Gleixner
2016-10-21 20:57 ` Alan Stern
  -- strict thread matches above, loose matches on Subject: below --
2016-10-21 15:29 Michael Thayer
2016-10-21 16:54 ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.