linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* another must-fix: major PS/2 mouse problem
@ 2003-06-01  1:46 Albert Cahalan
  2003-06-04  5:47 ` Yoann
       [not found] ` <3EDCF47A.1060605@ifrance.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Albert Cahalan @ 2003-06-01  1:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

Lots of people (check Google) get this message
from the kernel:

psmouse.c: Lost synchronization, throwing 2 bytes away.

(the number of bytes will be 1, 2, or 3)

At work, I get it when there is heavy NFS traffic.
The mouse goes crazy, jumping around and doing
random cut-and-paste all over everything. This is
with a decently fast and modern PC.

I'll guess that NFS and the mouse both have worker
threads fighting for CPU time, and neither is RT.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-01  1:46 another must-fix: major PS/2 mouse problem Albert Cahalan
@ 2003-06-04  5:47 ` Yoann
       [not found]   ` <20030603232155.1488c02f.akpm@digeo.com>
       [not found] ` <3EDCF47A.1060605@ifrance.com>
  1 sibling, 1 reply; 24+ messages in thread
From: Yoann @ 2003-06-04  5:47 UTC (permalink / raw)
  To: linux-kernel

is there a patch for this bug ?

I have the same problem with my laptop, chip sis630, celeron 1.2Ghz, 256MB of
RAM (32MB for video), mouse on PS/2 (ImPS/2) abd read mp3 throught nfs
partition (ethernet 100MB). I haven't try without traffic on nfs but I will
try next time I boot on the 2.5.70 (currently, I'm running a 2.4.20)

Yoann

Albert Cahalan wrote:
> Lots of people (check Google) get this message
> from the kernel:
> 
> psmouse.c: Lost synchronization, throwing 2 bytes away.
> 
> (the number of bytes will be 1, 2, or 3)
> 
> At work, I get it when there is heavy NFS traffic.
> The mouse goes crazy, jumping around and doing
> random cut-and-paste all over everything. This is
> with a decently fast and modern PC.
> 
> I'll guess that NFS and the mouse both have worker
> threads fighting for CPU time, and neither is RT.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
       [not found]   ` <20030603232155.1488c02f.akpm@digeo.com>
@ 2003-06-04  7:47     ` Vojtech Pavlik
  2003-06-04  7:53       ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Vojtech Pavlik @ 2003-06-04  7:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Yoann, linux-kernel, Vojtech Pavlik, Albert D.Cahalan

On Tue, Jun 03, 2003 at 11:21:55PM -0700, Andrew Morton wrote:

> We believe that it may be due to the ethernet driver holding interrupts off
> for too long when the traffic is heavy.

Note that this doesn't necessarily mean that the ethernet driver
disables the interrupts for a too long time, it just means that the
computer is only servicing the network interrupts at that time, and
since the mouse interrupt does have a lower priority, it's serviced
not very often and with huge delays.

In such a case the network driver should either use interrupt mitigation
if the cards supports it (reading many packets per one interrupt) or
switch to a polled mode.

> Does that seem to match your observations?  Does the problem happen when
> the net traffic is high?
> 
> Which ethernet driver are you using?

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-04  7:47     ` Vojtech Pavlik
@ 2003-06-04  7:53       ` Andrew Morton
  2003-06-04  8:00         ` Vojtech Pavlik
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-06-04  7:53 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: linux-yoann, linux-kernel, vojtech, acahalan

Vojtech Pavlik <vojtech@ucw.cz> wrote:
>
> On Tue, Jun 03, 2003 at 11:21:55PM -0700, Andrew Morton wrote:
> 
> > We believe that it may be due to the ethernet driver holding interrupts off
> > for too long when the traffic is heavy.
> 
> Note that this doesn't necessarily mean that the ethernet driver
> disables the interrupts for a too long time, it just means that the
> computer is only servicing the network interrupts at that time, and
> since the mouse interrupt does have a lower priority, it's serviced
> not very often and with huge delays.
> 
> In such a case the network driver should either use interrupt mitigation
> if the cards supports it (reading many packets per one interrupt) or
> switch to a polled mode.

Has this problem been observed in 2.4 kernels?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-04  7:53       ` Andrew Morton
@ 2003-06-04  8:00         ` Vojtech Pavlik
  2003-06-04  8:14           ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Vojtech Pavlik @ 2003-06-04  8:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-yoann, linux-kernel, vojtech, acahalan

On Wed, Jun 04, 2003 at 12:53:02AM -0700, Andrew Morton wrote:
> Vojtech Pavlik <vojtech@ucw.cz> wrote:
> >
> > On Tue, Jun 03, 2003 at 11:21:55PM -0700, Andrew Morton wrote:
> > 
> > > We believe that it may be due to the ethernet driver holding interrupts off
> > > for too long when the traffic is heavy.
> > 
> > Note that this doesn't necessarily mean that the ethernet driver
> > disables the interrupts for a too long time, it just means that the
> > computer is only servicing the network interrupts at that time, and
> > since the mouse interrupt does have a lower priority, it's serviced
> > not very often and with huge delays.
> > 
> > In such a case the network driver should either use interrupt mitigation
> > if the cards supports it (reading many packets per one interrupt) or
> > switch to a polled mode.
> 
> Has this problem been observed in 2.4 kernels?

No, since 2.4 doesn't have the re-sync code in the mouse driver which is
triggering in this case. But problems with the machine being flooded
with interrupts from the NIC so hard that it actually cannot do anything
are quite common.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-04  8:00         ` Vojtech Pavlik
@ 2003-06-04  8:14           ` Andrew Morton
  2003-06-04  8:40             ` Vojtech Pavlik
  2003-06-04 23:09             ` Albert Cahalan
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2003-06-04  8:14 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: linux-yoann, linux-kernel, vojtech, acahalan

Vojtech Pavlik <vojtech@ucw.cz> wrote:
>
> > Has this problem been observed in 2.4 kernels?
> 
>  No, since 2.4 doesn't have the re-sync code in the mouse driver which is
>  triggering in this case. But problems with the machine being flooded
>  with interrupts from the NIC so hard that it actually cannot do anything
>  are quite common.

So is the resync code doing more good than harm?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-04  8:14           ` Andrew Morton
@ 2003-06-04  8:40             ` Vojtech Pavlik
  2003-06-04 19:20               ` Yoann
  2003-06-04 23:09             ` Albert Cahalan
  1 sibling, 1 reply; 24+ messages in thread
From: Vojtech Pavlik @ 2003-06-04  8:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-yoann, linux-kernel, vojtech, acahalan

On Wed, Jun 04, 2003 at 01:14:13AM -0700, Andrew Morton wrote:

> > > Has this problem been observed in 2.4 kernels?
> > 
> >  No, since 2.4 doesn't have the re-sync code in the mouse driver which is
> >  triggering in this case. But problems with the machine being flooded
> >  with interrupts from the NIC so hard that it actually cannot do anything
> >  are quite common.
> 
> So is the resync code doing more good than harm?

Hard to tell. The people for which it does good don't complain.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-04  8:40             ` Vojtech Pavlik
@ 2003-06-04 19:20               ` Yoann
  0 siblings, 0 replies; 24+ messages in thread
From: Yoann @ 2003-06-04 19:20 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Andrew Morton, linux-kernel, vojtech, acahalan

Vojtech Pavlik wrote:
> On Wed, Jun 04, 2003 at 01:14:13AM -0700, Andrew Morton wrote:
> 
> 
>>>>Has this problem been observed in 2.4 kernels?
>>>
>>> No, since 2.4 doesn't have the re-sync code in the mouse driver which is
>>> triggering in this case. But problems with the machine being flooded
>>> with interrupts from the NIC so hard that it actually cannot do anything
>>> are quite common.
>>
>>So is the resync code doing more good than harm?
> 
> 
> Hard to tell. The people for which it does good don't complain.

I didn't reboot my pc yet, so I'm still running a 2.4.20 without any problem 
with my mouse. but when I will boot on the 2.5.70, what I should do to find 
where does the bug come from. I'm little but new here, so I never try to 
locate a bug in a kernel...

thanks for your advice

Yoann
--
Jugglers, like programmers, handle objects which, at first sight, seem complex 
and difficult to control. Some of them, with time and patience, manage to 
control one or the other or both at the same time, and thus become aware of 
what they are doing.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-06-04  8:14           ` Andrew Morton
  2003-06-04  8:40             ` Vojtech Pavlik
@ 2003-06-04 23:09             ` Albert Cahalan
  1 sibling, 0 replies; 24+ messages in thread
From: Albert Cahalan @ 2003-06-04 23:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vojtech Pavlik, linux-yoann, linux-kernel, vojtech

On Wed, 2003-06-04 at 04:14, Andrew Morton wrote:
> Vojtech Pavlik <vojtech@ucw.cz> wrote:

> >> Has this problem been observed in 2.4 kernels?
> > 
> >  No, since 2.4 doesn't have the re-sync code in the mouse driver which is
> >  triggering in this case. But problems with the machine being flooded
> >  with interrupts from the NIC so hard that it actually cannot do anything
> >  are quite common.
> 
> So is the resync code doing more good than harm?

The log message is useful.

I think the resync code is a bit like the OOM killer.
We need it, but something is wrong if it ever gets used.
It also doesn't quite work the way it should.

Anyway...

I only get the problem with NFS traffic. It may be
that NFS traffic is the only way I've yet found to
generate extreme network usage though.

The system with problems is an NFSv3 client that
gets abused by an in-house version control system
based on SCCS. I suppose this is like running
"tar xf foo.tar" or "tar xf foo.tar foo" over NFS.

The hardware is:

Pentium III (Coppermine)
1002.822 MHz
Apollo chipset

# lspci -s 00:0d.0 -v
00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado]
(rev 74)
        Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC
Management NIC
        Flags: bus master, medium devsel, latency 32, IRQ 11
        I/O ports at ec00 [size=128]
        Memory at df000000 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

# nfsstat -c       
Client rpc stats:
calls      retrans    authrefrsh
118380     7843       0       
Client nfs v2:
null       getattr    setattr    root       lookup     readlink   
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 
read       wrcache    write      create     remove     rename     
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 
link       symlink    mkdir      rmdir      readdir    fsstat     
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 

Client nfs v3:
null       getattr    setattr    lookup     access     readlink   
0       0% 12501  10% 114     0% 68765  58% 25538  21% 4       0% 
read       write      create     mkdir      symlink    mknod      
8830    7% 725     0% 377     0% 3       0% 1       0% 0       0% 
remove     rmdir      rename     link       readdir    readdirplus
498     0% 0       0% 367     0% 173     0% 0       0% 10      0% 
fsstat     fsinfo     pathconf   commit     
2       0% 2       0% 0       0% 470     0% 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
       [not found]     ` <3EDD8850.9060808@ifrance.com>
@ 2003-07-23  0:44       ` Albert Cahalan
  2003-07-24 17:30         ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Albert Cahalan @ 2003-07-23  0:44 UTC (permalink / raw)
  To: Yoann; +Cc: linux-kernel, Andrew Morton, vortex, jgarzik

I may have found the problem!

On Tue, 2003-06-03 at 15:18, Yoann wrote:

> I have the same problem with my laptop, chip sis630,
> celeron 1.2Ghz, 256MB of RAM (32MB for video), mouse
> on PS/2 (ImPS/2) abd read mp3 throught nfs partition
> (ethernet 100MB). I haven't try without traffic on
> nfs but I will try next time I boot on the 2.5.70

Using the lockmeter on a 2.5.75 kernel, I discovered
that boomerang_interrupt() grabs a spinlock for over
1/4 second. No joke, 253 ms. Interrupts are off AFAIK.

Mouse behavior is terrible.

It should be no surprise that NTP isn't working too
well either. The ntpd daemon keeps complaining about
losing sync and having to advance the clock by amounts
of over 100 seconds.

Could somebody with the hardware manual take a look
at that function?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-23  0:44       ` Albert Cahalan
@ 2003-07-24 17:30         ` Andrew Morton
  2003-07-25  1:46           ` Albert Cahalan
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-07-24 17:30 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-yoann, linux-kernel, akpm, vortex, jgarzik

Albert Cahalan <albert@users.sourceforge.net> wrote:
>
> Using the lockmeter on a 2.5.75 kernel, I discovered
> that boomerang_interrupt() grabs a spinlock for over
> 1/4 second. No joke, 253 ms. Interrupts are off AFAIK.

boomerang_interrupt() doesn't disable interrupts.  Is the NIC sharing the
mouse's IRQ line?

boomerang_interrupt() is only used by nasty old NICs and yes, I guess it is
possible that something has gone wrong and is causing occasional long spins
in there.

But I am more suspecting that you're not really using boomerang_interrupt()
at all, and that something has gone wrong with lockmeter.  What sort of NIC
are you using?

Bear in mind that if some other device generates an interrupt while the CPU
is running boomerang_interrupt(), lockmeter will count the time spent in
that other device's interrupt as "time spent in boomerand_interrupt()". 
Which is very true, but it is not much help when one is trying to identify
the source of the problem.

Perhaps what you should do is to do an rdtsc on entry and exit of do_IRQ()
and print stuff out when "long" periods of time in do_IRQ() are noticed.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-24 17:30         ` Andrew Morton
@ 2003-07-25  1:46           ` Albert Cahalan
  2003-07-26  3:19             ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Albert Cahalan @ 2003-07-25  1:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert Cahalan, linux-yoann, linux-kernel mailing list,
	Andrew Morton, vortex, jgarzik

On Thu, 2003-07-24 at 13:30, Andrew Morton wrote:
> Albert Cahalan <albert@users.sourceforge.net> wrote:

> > Using the lockmeter on a 2.5.75 kernel, I discovered
> > that boomerang_interrupt() grabs a spinlock for over
> > 1/4 second. No joke, 253 ms. Interrupts are off AFAIK.
> 
> boomerang_interrupt() doesn't disable interrupts.  Is the NIC sharing the
> mouse's IRQ line?

No.

           CPU0       
  0:     746770          XT-PIC  timer
  1:        936          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  4:          9          XT-PIC  serial
  5:          0          XT-PIC  uhci-hcd, uhci-hcd
 11:       2417          XT-PIC  eth0
 12:         60          XT-PIC  i8042
 14:      13844          XT-PIC  ide0
 15:          2          XT-PIC  ide1
NMI:          0 
LOC:     751552 
ERR:          0
MIS:          0


> boomerang_interrupt() is only used by nasty old NICs and yes, I guess it is
> possible that something has gone wrong and is causing occasional long spins
> in there.
> 
> But I am more suspecting that you're not really using boomerang_interrupt()
> at all, and that something has gone wrong with lockmeter.  What sort of NIC
> are you using?

I hope you don't consider a 100 Mb/s PCI device to be
a nasty old NIC. It's not an NE2000 you know! I have this:

00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
        Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
        Flags: bus master, medium devsel, latency 32, IRQ 11
        I/O ports at ec00 [size=128]
        Memory at df001000 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

Without heavy net usage, boomerang_interrupt can take
as long as 1950 microseconds. That would be from mounting
an NFS filesystem and receiving broadcast packets.
I didn't have an opportunity to hit NFS hard today.

That's from rdtsc on a 1002-MHz Pentium III.

> Bear in mind that if some other device generates an interrupt while the CPU
> is running boomerang_interrupt(), lockmeter will count the time spent in
> that other device's interrupt as "time spent in boomerand_interrupt()". 
> Which is very true, but it is not much help when one is trying to identify
> the source of the problem.

Do the Intel IRQ controller priority rules play a role here?

> Perhaps what you should do is to do an rdtsc on entry and exit of do_IRQ()
> and print stuff out when "long" periods of time in do_IRQ() are noticed.

I added code to the top and bottom of do_IRQ, as well as to
the top and bottom of boomerang_interrupt. The lockmeter was
compiled into the kernel but never enabled. I record the
minimum and maximum time in microseconds.

-------------------------------
IRQ    num use      min     max
--- ------ -------- --- -------   
  0 746770 timer     40  103595
  1    936 i8042     13  389773
  2      0 cascade    -       -
  3      - -          -       -
  4      9 serial    28      56
  5      0 uhci-hcd   -       -
  6      - -        711     711
  7      - -         25      25
  8      - -          -       -
  9      - -          -       -
 10      - -          -       -
 11   2417 eth0      87 1535331
 12     60 i8042     18  102895
 13      - -          -       -
 14  13844 ide0       8   51944
 15      2 ide1       7      11 
NMI      0
LOC 751552
ERR      0
MIS      0
-------------------------------




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-25  1:46           ` Albert Cahalan
@ 2003-07-26  3:19             ` Andrew Morton
  2003-07-26 15:16               ` Zwane Mwaikambo
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-07-26  3:19 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: albert, linux-yoann, linux-kernel, akpm, vortex, jgarzik

Albert Cahalan <albert@users.sourceforge.net> wrote:
>
> I hope you don't consider a 100 Mb/s PCI device to be
> a nasty old NIC. It's not an NE2000 you know! I have this:

Sorry, I got my boomerangs and vortices mixed up. Vortex is the ancient one.

> I added code to the top and bottom of do_IRQ, as well as to
> the top and bottom of boomerang_interrupt. The lockmeter was
> compiled into the kernel but never enabled. I record the
> minimum and maximum time in microseconds.
> 
> -------------------------------
> IRQ    num use      min     max
> --- ------ -------- --- -------   
>   0 746770 timer     40  103595
>   1    936 i8042     13  389773
>   2      0 cascade    -       -
>   3      - -          -       -
>   4      9 serial    28      56
>   5      0 uhci-hcd   -       -
>   6      - -        711     711
>   7      - -         25      25
>   8      - -          -       -
>   9      - -          -       -
>  10      - -          -       -
>  11   2417 eth0      87 1535331
>  12     60 i8042     18  102895
>  13      - -          -       -
>  14  13844 ide0       8   51944
>  15      2 ide1       7      11 

But did your instrumentation account for nested interrupts?  What happens
if a slow i8042 interrupt happens in the middle of a 3c59x interrupt?

Still, that probably doesn't account for the stalls.

I don't know what does account for it, frankly.  You could try dropping the
2.4 driver into the 2.5 tree just to verify that it is not a driver
problem.  The driver has hardly changed at all.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-26  3:19             ` Andrew Morton
@ 2003-07-26 15:16               ` Zwane Mwaikambo
  2003-07-29  2:55                 ` Albert Cahalan
  0 siblings, 1 reply; 24+ messages in thread
From: Zwane Mwaikambo @ 2003-07-26 15:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert Cahalan, linux-yoann, linux-kernel, akpm, vortex, jgarzik

On Fri, 25 Jul 2003, Andrew Morton wrote:

> But did your instrumentation account for nested interrupts?  What happens
> if a slow i8042 interrupt happens in the middle of a 3c59x interrupt?

Just to verify that, he could remove the local_irq_enable for 
!SA_INTERRUPT.

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-26 15:16               ` Zwane Mwaikambo
@ 2003-07-29  2:55                 ` Albert Cahalan
  2003-07-29  3:14                   ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Albert Cahalan @ 2003-07-29  2:55 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Andrew Morton, Albert Cahalan, linux-yoann,
	linux-kernel mailing list, Andrew Morton, vortex, jgarzik

On Sat, 2003-07-26 at 11:16, Zwane Mwaikambo wrote:
> On Fri, 25 Jul 2003, Andrew Morton wrote:
> 
> > But did your instrumentation account for nested interrupts?  What happens
> > if a slow i8042 interrupt happens in the middle of a 3c59x interrupt?
> 
> Just to verify that, he could remove the local_irq_enable for 
> !SA_INTERRUPT.

OK, I did this. Now, in microseconds, I get:

------------------------
IRQ use      min     max
--- -------- --- -------   
  0 timer     40  103968
  1 i8042     14    1138 (was 389773)
  2 cascade    -       -
  3 -          -       -
  4 serial    29      56
  5 uhci-hcd   -       -
  6 -        690     690
  7 -         40      40
  8 -          -       -
  9 -          -       -
 10 -          -       -
 11 eth0      73   31332 (was 1535331)
 12 i8042     18     215 (was 102895)
 13 -          -       -
 14 ide0       7   43846
 15 ide1       7      12 
------------------------
   
boomerang_interrupt itself takes 4 to 59 microseconds.

Then I switched to 2.6.0-test2. Testing more, I get the
problem with or without SMP and with or without
preemption. Here's a chunk of my log file:

Loosing too many ticks!
TSC cannot be used as a timesource. (Are you running with SpeedStep?)
Falling back to a sane timesource.
psmouse.c: Lost synchronization, throwing 3 bytes away.
psmouse.c: Lost synchronization, throwing 1 bytes away.

Arrrrgh! The TSC is my only good time source!

Remember that this is a pretty normal system. I have
a Red Hat 8 install w/ required upgrades, ext3, IDE,
a 1-GHz Pentium III, a boring VIA chipset, etc.

To reproduce, I do some PS/2 mouse movement while
doing one of:

a. Lots of concurrent write() and sync() activity to ext3.
b. Lots of NFSv3 traffic.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-29  2:55                 ` Albert Cahalan
@ 2003-07-29  3:14                   ` Andrew Morton
  2003-07-29 12:40                     ` Albert Cahalan
  2003-07-30  5:08                     ` Pavel Machek
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2003-07-29  3:14 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: zwane, albert, linux-yoann, linux-kernel, akpm, vortex, jgarzik

Albert Cahalan <albert@users.sourceforge.net> wrote:
>
> OK, I did this. Now, in microseconds, I get:
> 
> ------------------------
> IRQ use      min     max
> --- -------- --- -------   
>   0 timer     40  103968
>   1 i8042     14    1138 (was 389773)
>   2 cascade    -       -
>   3 -          -       -
>   4 serial    29      56
>   5 uhci-hcd   -       -
>   6 -        690     690
>   7 -         40      40
>   8 -          -       -
>   9 -          -       -
>  10 -          -       -
>  11 eth0      73   31332 (was 1535331)
>  12 i8042     18     215 (was 102895)
>  13 -          -       -
>  14 ide0       7   43846
>  15 ide1       7      12 
> ------------------------
>    
> boomerang_interrupt itself takes 4 to 59 microseconds.

So this looks OK, yes?  (Is that instrumentation patch productisable? 
Looks handly, albeit a subset of microstate accounting)

> Then I switched to 2.6.0-test2. Testing more, I get the
> problem with or without SMP and with or without
> preemption. Here's a chunk of my log file:
> 
> Loosing too many ticks!
> TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> Falling back to a sane timesource.
> psmouse.c: Lost synchronization, throwing 3 bytes away.
> psmouse.c: Lost synchronization, throwing 1 bytes away.
> 
> Arrrrgh! The TSC is my only good time source!

Arrrgh!  More PS/2 problems!

I think the lost synchronisation is the problem, would you agree?

The person who fixes this gets a Nobel prize.

> Remember that this is a pretty normal system. I have
> a Red Hat 8 install w/ required upgrades, ext3, IDE,
> a 1-GHz Pentium III, a boring VIA chipset, etc.
> 
> To reproduce, I do some PS/2 mouse movement while
> doing one of:
> 
> a. Lots of concurrent write() and sync() activity to ext3.
> b. Lots of NFSv3 traffic.

ie: lots of interrupt traffic causes the PS2 driver to go whacky?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-29  3:14                   ` Andrew Morton
@ 2003-07-29 12:40                     ` Albert Cahalan
  2003-07-29 18:58                       ` Andrew Morton
  2003-07-30  5:08                     ` Pavel Machek
  1 sibling, 1 reply; 24+ messages in thread
From: Albert Cahalan @ 2003-07-29 12:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert Cahalan, zwane, linux-yoann, linux-kernel mailing list,
	Andrew Morton, vortex, jgarzik

On Mon, 2003-07-28 at 23:14, Andrew Morton wrote:
> Albert Cahalan <albert@users.sourceforge.net> wrote:

> > OK, I did this. Now, in microseconds, I get:
> > 
> > ------------------------
> > IRQ use      min     max
> > --- -------- --- -------   
> >   0 timer     40  103968
> >   1 i8042     14    1138 (was 389773)
> >   2 cascade    -       -
> >   3 -          -       -
> >   4 serial    29      56
> >   5 uhci-hcd   -       -
> >   6 -        690     690
> >   7 -         40      40
> >   8 -          -       -
> >   9 -          -       -
> >  10 -          -       -
> >  11 eth0      73   31332 (was 1535331)
> >  12 i8042     18     215 (was 102895)
> >  13 -          -       -
> >  14 ide0       7   43846
> >  15 ide1       7      12 
> > ------------------------
> >    
> > boomerang_interrupt itself takes 4 to 59 microseconds.
> 
> So this looks OK, yes?

I suppose boomerang_interrupt itself is OK.
Spending 104 ms in IRQ 0, 31 ms in IRQ 11, and
44 ms in IRQ 14 is not at all OK. I was hoping
to get under 200 microseconds for everything.

> (Is that instrumentation patch productisable? 
> Looks handly, albeit a subset of microstate accounting)

Not really. I printk() when a value exceeds the
saved maximum, then scan my logs for the first
and last values. There's also hard-coded knowledge
of my 1-GHz CPU, which lets me convert to microseconds
as follows:  us = (unsigned)(ns64>>3)/125u;

(that lets me handle up to 32 seconds)

Huh. So the minimum value is really the first value.
Later values could be less, but that's not important.
I suppose that true min/max via a /proc file would
be pretty easy to implement. I like my 1-GHz hack.
I like a TSC that measures in nanoseconds too.

> > Then I switched to 2.6.0-test2. Testing more, I get the
> > problem with or without SMP and with or without
> > preemption. Here's a chunk of my log file:
> > 
> > Loosing too many ticks!
> > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > Falling back to a sane timesource.
> > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > psmouse.c: Lost synchronization, throwing 1 bytes away.
> > 
> > Arrrrgh! The TSC is my only good time source!
> 
> Arrrgh!  More PS/2 problems!
> 
> I think the lost synchronisation is the problem, would you agree?

It's one problem. It's a problem other people have seen.
My TSC should be good though; I'd like to use it.
At times ntpd (the NTP daemon) gets really unhappy with
the situation, yanking my clock ahead by up to 10 minutes
to compensate for lost time.

> The person who fixes this gets a Nobel prize.
> 
> > Remember that this is a pretty normal system. I have
> > a Red Hat 8 install w/ required upgrades, ext3, IDE,
> > a 1-GHz Pentium III, a boring VIA chipset, etc.
> > 
> > To reproduce, I do some PS/2 mouse movement while
> > doing one of:
> > 
> > a. Lots of concurrent write() and sync() activity to ext3.
> > b. Lots of NFSv3 traffic.
> 
> ie: lots of interrupt traffic causes the PS2 driver to go whacky?

I guess so. The ext3+IDE behavior seems to lift the blame
from boomerang_interrupt. Using ext3+IDE, I seem to need
a couple minutes to reproduce the problem. NFSv3+Ethernet
will give me the problem almost instantly.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-29 12:40                     ` Albert Cahalan
@ 2003-07-29 18:58                       ` Andrew Morton
  2003-07-29 19:36                         ` Zwane Mwaikambo
  2003-07-29 19:43                         ` Chris Friesen
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2003-07-29 18:58 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: albert, zwane, linux-yoann, linux-kernel, akpm, vortex, jgarzik

Albert Cahalan <albert@users.sourceforge.net> wrote:
>
> > So this looks OK, yes?
> 
> I suppose boomerang_interrupt itself is OK.
> Spending 104 ms in IRQ 0, 31 ms in IRQ 11, and
> 44 ms in IRQ 14 is not at all OK. I was hoping
> to get under 200 microseconds for everything.

I misread that.

Last time I checked (which was about 18 months ago) the maximum
interrupts-off time on a 500MHz desktop-class machine was 80 microseconds.

Something is broken there.  Do you have another machine to sanity check
against?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-29 18:58                       ` Andrew Morton
@ 2003-07-29 19:36                         ` Zwane Mwaikambo
  2003-07-29 19:43                         ` Chris Friesen
  1 sibling, 0 replies; 24+ messages in thread
From: Zwane Mwaikambo @ 2003-07-29 19:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert Cahalan, linux-yoann, linux-kernel, akpm, vortex, jgarzik

On Tue, 29 Jul 2003, Andrew Morton wrote:

> Albert Cahalan <albert@users.sourceforge.net> wrote:
> >
> > > So this looks OK, yes?
> > 
> > I suppose boomerang_interrupt itself is OK.
> > Spending 104 ms in IRQ 0, 31 ms in IRQ 11, and
> > 44 ms in IRQ 14 is not at all OK. I was hoping
> > to get under 200 microseconds for everything.
>
> I misread that.
> 
> Last time I checked (which was about 18 months ago) the maximum
> interrupts-off time on a 500MHz desktop-class machine was 80 microseconds.

IDE has traditionally been a small headache in that department. I need to 
find out how it fares in 2.5
 

-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-29 18:58                       ` Andrew Morton
  2003-07-29 19:36                         ` Zwane Mwaikambo
@ 2003-07-29 19:43                         ` Chris Friesen
  1 sibling, 0 replies; 24+ messages in thread
From: Chris Friesen @ 2003-07-29 19:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert Cahalan, zwane, linux-yoann, linux-kernel, akpm, vortex, jgarzik

Andrew Morton wrote:

> Last time I checked (which was about 18 months ago) the maximum
> interrupts-off time on a 500MHz desktop-class machine was 80 microseconds.

You might want to bump that up a little bit.  Querying carrier signal on 
a tulip chip is 100usecs with interrupts off.

Doesn't make any difference here though.

Chris


-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-29  3:14                   ` Andrew Morton
  2003-07-29 12:40                     ` Albert Cahalan
@ 2003-07-30  5:08                     ` Pavel Machek
  2003-07-30  6:32                       ` Andrew Morton
  2003-07-30 12:29                       ` Albert Cahalan
  1 sibling, 2 replies; 24+ messages in thread
From: Pavel Machek @ 2003-07-30  5:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Albert Cahalan, zwane, linux-yoann, linux-kernel, akpm, vortex,
	jgarzik, vojtech

Hi!

> > Loosing too many ticks!
> > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > Falling back to a sane timesource.
> > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > psmouse.c: Lost synchronization, throwing 1 bytes away.
> > 
> > Arrrrgh! The TSC is my only good time source!
> 
> Arrrgh!  More PS/2 problems!
> 
> I think the lost synchronisation is the problem, would you agree?
> 
> The person who fixes this gets a Nobel prize.


If you set ps/2 synchronization timeout to 20 seconds, you are going to make vojtech
unhappy (he likes that code :-), but at least 2.6.0 will not be worse than 2.4.x...

Do you want me to create a patch?
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-30  5:08                     ` Pavel Machek
@ 2003-07-30  6:32                       ` Andrew Morton
  2003-07-30 12:29                       ` Albert Cahalan
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-07-30  6:32 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, vojtech

Pavel Machek <pavel@ucw.cz> wrote:
>
> Hi!
> 
> > > Loosing too many ticks!
> > > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > > Falling back to a sane timesource.
> > > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > > psmouse.c: Lost synchronization, throwing 1 bytes away.
> > > 
> > > Arrrrgh! The TSC is my only good time source!
> > 
> > Arrrgh!  More PS/2 problems!
> > 
> > I think the lost synchronisation is the problem, would you agree?
> > 
> > The person who fixes this gets a Nobel prize.
> 
> 
> If you set ps/2 synchronization timeout to 20 seconds, you are going to make vojtech
> unhappy (he likes that code :-), but at least 2.6.0 will not be worse than 2.4.x...

2.6 is currently much worse than 2.4: we're buried in what appear to be
many different varieties of PS/2 bug reports.


> Do you want me to create a patch?

Well I do not know what the problem with synchronisation is, not what
solution you propose.

But yeah, I like patches ;)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
  2003-07-30  5:08                     ` Pavel Machek
  2003-07-30  6:32                       ` Andrew Morton
@ 2003-07-30 12:29                       ` Albert Cahalan
  1 sibling, 0 replies; 24+ messages in thread
From: Albert Cahalan @ 2003-07-30 12:29 UTC (permalink / raw)
  To: linux-kernel mailing list
  Cc: mikpe, 0, pavel, Andrew Morton, Albert Cahalan, zwane,
	linux-yoann, vojtech

On Wed, 2003-07-30 at 01:08, Pavel Machek wrote:
> Hi!
> 
> > > Loosing too many ticks!
> > > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > > Falling back to a sane timesource.
> > > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > > psmouse.c: Lost synchronization, throwing 1 bytes away.
> > > 
> > > Arrrrgh! The TSC is my only good time source!
> > 
> > Arrrgh!  More PS/2 problems!
> > 
> > I think the lost synchronisation is the problem, would you agree?
> > 
> > The person who fixes this gets a Nobel prize.
> 
> 
> If you set ps/2 synchronization timeout to 20 seconds, you are going to make vojtech
> unhappy (he likes that code :-), but at least 2.6.0 will not be worse than 2.4.x...
> 
> Do you want me to create a patch?

No. That will just hide one symptom of the problem,
making things more difficult to debug.

It won't fix my clock, which the ntpd program keeps
complaining about. Under heavy load, my clock falls
behind so much that ntpd gives up on the gentle treatment
and just yanks the clock forward by as much as 10 minutes.

It won't make the mouse run well. Maybe you'd stop the
mouse from going crazy from time to time, but there'd
still temporary freezes from time to time. (not OK!)

It won't convince Linux that my TSC isn't broken.

It won't solve Mikael Pettersson's problem, posted
under the subject "[BUG] 2.6.0-test2 loses time on 486".
He writes:

"My old 486 test box is losing time at an alarming rate
when running 2.6.0-test kernels. It loses almost 2 minutes
per hour, less if it sits idle. This problem does not
occur when it's running a 2.4 kernel."

Gee, I get that too, on a 1 GHz Pentium III. It seems
we're all losing LOTS of clock ticks and other interrupts.

I took the net-related email addresses off the Cc: list.
Please leave me on it so I don't have to break threading.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: another must-fix: major PS/2 mouse problem
@ 2003-07-30 19:18 Mikael Pettersson
  0 siblings, 0 replies; 24+ messages in thread
From: Mikael Pettersson @ 2003-07-30 19:18 UTC (permalink / raw)
  To: albert, linux-kernel; +Cc: 0, akpm, linux-yoann, pavel, vojtech, zwane

On 30 Jul 2003 08:29:32 -0400, Albert Cahalan wrote:
>> > > psmouse.c: Lost synchronization, throwing 3 bytes away.
>> > > psmouse.c: Lost synchronization, throwing 1 bytes away.
>> > > 
>> > > Arrrrgh! The TSC is my only good time source!
>> > 
>> > Arrrgh!  More PS/2 problems!
>> > 
>> > I think the lost synchronisation is the problem, would you agree?
>> > 
>> > The person who fixes this gets a Nobel prize.
...
>It won't make the mouse run well. Maybe you'd stop the
>mouse from going crazy from time to time, but there'd
>still temporary freezes from time to time. (not OK!)

FWIW, the problems my Dell Latitude had with the external
mice I use with it were significantly reduced once I added
"psmouse_noext" to the kernel's command line. That one
change eliminated all lost sync messages and general craziness
after resumes from suspended state.

To make the mouse move at proper speed w/o jerkiness I
also had to tweak the rate and scaling programmed into it
to match 2.4 defaults. (rate 100, scale 2:1)

In fairness, only my old Latitude has these PS/2 issues.

/Mikael

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-07-30 19:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-01  1:46 another must-fix: major PS/2 mouse problem Albert Cahalan
2003-06-04  5:47 ` Yoann
     [not found]   ` <20030603232155.1488c02f.akpm@digeo.com>
2003-06-04  7:47     ` Vojtech Pavlik
2003-06-04  7:53       ` Andrew Morton
2003-06-04  8:00         ` Vojtech Pavlik
2003-06-04  8:14           ` Andrew Morton
2003-06-04  8:40             ` Vojtech Pavlik
2003-06-04 19:20               ` Yoann
2003-06-04 23:09             ` Albert Cahalan
     [not found] ` <3EDCF47A.1060605@ifrance.com>
     [not found]   ` <1054681254.22103.3750.camel@cube>
     [not found]     ` <3EDD8850.9060808@ifrance.com>
2003-07-23  0:44       ` Albert Cahalan
2003-07-24 17:30         ` Andrew Morton
2003-07-25  1:46           ` Albert Cahalan
2003-07-26  3:19             ` Andrew Morton
2003-07-26 15:16               ` Zwane Mwaikambo
2003-07-29  2:55                 ` Albert Cahalan
2003-07-29  3:14                   ` Andrew Morton
2003-07-29 12:40                     ` Albert Cahalan
2003-07-29 18:58                       ` Andrew Morton
2003-07-29 19:36                         ` Zwane Mwaikambo
2003-07-29 19:43                         ` Chris Friesen
2003-07-30  5:08                     ` Pavel Machek
2003-07-30  6:32                       ` Andrew Morton
2003-07-30 12:29                       ` Albert Cahalan
2003-07-30 19:18 Mikael Pettersson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).