All of lore.kernel.org
 help / color / mirror / Atom feed
* sky2 problems on Intel Mac Mini
@ 2007-01-29 23:57 Chris Lightfoot
  2007-01-30  0:01 ` Stephen Hemminger
  2007-01-30 19:15 ` Stephen Hemminger
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Lightfoot @ 2007-01-29 23:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

  [ please cc: me on any reply ]

I'm seeing lots of problems with the sky2 driver on Mac
Minis. Based on the suggestions in,
    http://www.mail-archive.com/netdev@vger.kernel.org/msg28221.html
I am running stock 2.6.19 + the patches from the
mactel-linux.org site to get the kernel booting on the
Apple hardware; none of these touches the sky2 code. The
module is installed with disable_msi=1 and
idle_timeout=10; the chip version is,
    Yukon-EC (0xb6) rev 2

The crashes we're seeing at the moment show (with
debug=16) lots and lots of transmits being queued up and
never being completed, even with the timeout switched on.
For instance, (this is on a machine running NFS root and
vlans)

    [ lots of normal activity alternating tx queued / tx done ]
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 65, len 150 
Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 106 status 0x9e2100 len 154 
Jan 29 21:03:22 yeti kernel: eth0: tx done 66 
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 67, len 150 
Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 107 status 0x9e2100 len 154 
Jan 29 21:03:22 yeti kernel: eth0: tx done 68 
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 69, len 150 
Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 108 status 0x9e2100 len 154 
Jan 29 21:03:22 yeti kernel: eth0: tx done 70 
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 71, len 89 
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 73, len 1090
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 75, len 1514
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 79, len 90 
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 81, len 1514 
Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 84, len 1090 
Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 86, len 98 
Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 88, len 1514 
Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 91, len 1090 
Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 93, len 54 
Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 94, len 66 
Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 95, len 54 
Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 96, len 66 
Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 97, len 98 
    [ ... and so on for a total of 109 tx queued with no tx done, after which
      our watchdog rebooted the machine ] 

-- though we've also seen, e.g., (no NFS root, no vlans)

Jan 28 19:32:16 t1 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 28 19:32:16 t1 kernel: sky2 eth0: tx timeout
Jan 28 19:32:16 t1 kernel: sky2 eth0: transmit ring 115 .. 92 report=115 done=115
Jan 28 19:32:16 t1 kernel: sky2 hardware hung? flushing
Jan 28 19:32:25 t1 kernel: BUG: soft lockup detected on CPU#0!
Jan 28 19:32:25 t1 kernel:  [<c015495a>] softlockup_tick+0xba/0xe0
Jan 28 19:32:25 t1 kernel:  [<c01327e9>] update_process_times+0x39/0x90
Jan 28 19:32:25 t1 kernel:  [<c0117337>] smp_apic_timer_interrupt+0x97/0xc0
Jan 28 19:32:25 t1 kernel:  [<c0103eab>] apic_timer_interrupt+0x1f/0x24
Jan 28 19:32:25 t1 kernel:  [<c0445107>] _spin_lock_irqsave+0x67/0x80
Jan 28 19:32:25 t1 kernel:  [<c0445136>] _spin_lock_bh+0x6/0x20
Jan 28 19:32:25 t1 kernel:  [<c0302f40>] sky2_tx_clean+0x20/0x70
Jan 28 19:32:25 t1 kernel:  [<c0303904>] sky2_tx_timeout+0x144/0x1b0
Jan 28 19:32:25 t1 kernel:  [<c03da1c0>] dev_watchdog+0x0/0xe0
Jan 28 19:32:25 t1 kernel:  [<c03da28e>] dev_watchdog+0xce/0xe0
Jan 28 19:32:25 t1 kernel:  [<c0132916>] run_timer_softirq+0xc6/0x1c0
Jan 28 19:32:25 t1 kernel:  [<c0120c80>] scheduler_tick+0xb0/0x3a0
Jan 28 19:32:25 t1 kernel:  [<c012d1ea>] __do_softirq+0xca/0xf0
Jan 28 19:32:25 t1 kernel:  [<c012d245>] do_softirq+0x35/0x40
Jan 28 19:32:25 t1 kernel:  [<c012d295>] irq_exit+0x45/0x50
Jan 28 19:32:25 t1 kernel:  [<c011733c>] smp_apic_timer_interrupt+0x9c/0xc0
Jan 28 19:32:25 t1 kernel:  [<c0103eab>] apic_timer_interrupt+0x1f/0x24
Jan 28 19:32:25 t1 kernel:  [<c0101332>] mwait_idle_with_hints+0x32/0x40
Jan 28 19:32:25 t1 kernel:  [<c0101370>] mwait_idle+0x30/0x40
Jan 28 19:32:25 t1 kernel:  [<c0101144>] cpu_idle+0x94/0xe0
Jan 28 19:32:25 t1 kernel:  [<c0592a16>] start_kernel+0x1c6/0x230
Jan 28 19:32:25 t1 kernel:  [<c0592360>] unknown_bootoption+0x0/0x1e0
Jan 28 19:32:25 t1 kernel:  =======================

-- I assume this is just the same problem exhibiting on a
kernel with soft lockups detection enabled?

Hopefully I should be able to actually log into one of
these machines over an alternate connection next time the
problem recurs, at which point I should be able to get
ethtool -d output. Anything else I should do at that
point?

Any suggestions for what to do next to chase this problem
down? I haven't yet tried the sk98lin driver on this
hardware; is that still worth doing? Are there any useful
tests we should try? Unfortunately, though these crashes
happen pretty frequently (several times per day
typically), I don't have a test case to reproduce one;
however, if it'd be useful, I can probably get a pcap
trace of the period immediately before the interface falls
over using port mirroring on the switch to which the
machines are connected. Is that likely to be informative?

-- 
``Is `colons' the plural of `semi-colon' if you know
  you have an even number of them?'' (David Richerby)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-29 23:57 sky2 problems on Intel Mac Mini Chris Lightfoot
@ 2007-01-30  0:01 ` Stephen Hemminger
  2007-01-30  8:39   ` Chris Lightfoot
  2007-01-30 19:15 ` Stephen Hemminger
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2007-01-30  0:01 UTC (permalink / raw)
  To: Chris Lightfoot; +Cc: Stephen Hemminger, netdev

On Mon, 29 Jan 2007 23:57:32 +0000
Chris Lightfoot <chris@ex-parrot.com> wrote:

>   [ please cc: me on any reply ]
> 
> I'm seeing lots of problems with the sky2 driver on Mac
> Minis. Based on the suggestions in,
>     http://www.mail-archive.com/netdev@vger.kernel.org/msg28221.html
> I am running stock 2.6.19 + the patches from the
> mactel-linux.org site to get the kernel booting on the
> Apple hardware; none of these touches the sky2 code. The
> module is installed with disable_msi=1 and
> idle_timeout=10; the chip version is,
>     Yukon-EC (0xb6) rev 2
> 
> The crashes we're seeing at the moment show (with
> debug=16) lots and lots of transmits being queued up and
> never being completed, even with the timeout switched on.
> For instance, (this is on a machine running NFS root and
> vlans)

Is this NFS over UDP?

> 
>     [ lots of normal activity alternating tx queued / tx done ]
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 65, len 150 
> Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 106 status 0x9e2100 len 154 
> Jan 29 21:03:22 yeti kernel: eth0: tx done 66 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 67, len 150 
> Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 107 status 0x9e2100 len 154 
> Jan 29 21:03:22 yeti kernel: eth0: tx done 68 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 69, len 150 
> Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 108 status 0x9e2100 len 154 
> Jan 29 21:03:22 yeti kernel: eth0: tx done 70 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 71, len 89 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 73, len 1090
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 75, len 1514
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 79, len 90 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 81, len 1514 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 84, len 1090 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 86, len 98 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 88, len 1514 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 91, len 1090 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 93, len 54 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 94, len 66 
> Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 95, len 54 
> Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 96, len 66 
> Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 97, len 98 
>     [ ... and so on for a total of 109 tx queued with no tx done, after which
>       our watchdog rebooted the machine ] 
> 
> -- though we've also seen, e.g., (no NFS root, no vlans)
> 
> Jan 28 19:32:16 t1 kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Jan 28 19:32:16 t1 kernel: sky2 eth0: tx timeout
> Jan 28 19:32:16 t1 kernel: sky2 eth0: transmit ring 115 .. 92 report=115 done=115
> Jan 28 19:32:16 t1 kernel: sky2 hardware hung? flushing
> Jan 28 19:32:25 t1 kernel: BUG: soft lockup detected on CPU#0!
> Jan 28 19:32:25 t1 kernel:  [<c015495a>] softlockup_tick+0xba/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c01327e9>] update_process_times+0x39/0x90
> Jan 28 19:32:25 t1 kernel:  [<c0117337>] smp_apic_timer_interrupt+0x97/0xc0
> Jan 28 19:32:25 t1 kernel:  [<c0103eab>] apic_timer_interrupt+0x1f/0x24
> Jan 28 19:32:25 t1 kernel:  [<c0445107>] _spin_lock_irqsave+0x67/0x80
> Jan 28 19:32:25 t1 kernel:  [<c0445136>] _spin_lock_bh+0x6/0x20
> Jan 28 19:32:25 t1 kernel:  [<c0302f40>] sky2_tx_clean+0x20/0x70
> Jan 28 19:32:25 t1 kernel:  [<c0303904>] sky2_tx_timeout+0x144/0x1b0
> Jan 28 19:32:25 t1 kernel:  [<c03da1c0>] dev_watchdog+0x0/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c03da28e>] dev_watchdog+0xce/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c0132916>] run_timer_softirq+0xc6/0x1c0
> Jan 28 19:32:25 t1 kernel:  [<c0120c80>] scheduler_tick+0xb0/0x3a0
> Jan 28 19:32:25 t1 kernel:  [<c012d1ea>] __do_softirq+0xca/0xf0
> Jan 28 19:32:25 t1 kernel:  [<c012d245>] do_softirq+0x35/0x40
> Jan 28 19:32:25 t1 kernel:  [<c012d295>] irq_exit+0x45/0x50
> Jan 28 19:32:25 t1 kernel:  [<c011733c>] smp_apic_timer_interrupt+0x9c/0xc0
> Jan 28 19:32:25 t1 kernel:  [<c0103eab>] apic_timer_interrupt+0x1f/0x24
> Jan 28 19:32:25 t1 kernel:  [<c0101332>] mwait_idle_with_hints+0x32/0x40
> Jan 28 19:32:25 t1 kernel:  [<c0101370>] mwait_idle+0x30/0x40
> Jan 28 19:32:25 t1 kernel:  [<c0101144>] cpu_idle+0x94/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c0592a16>] start_kernel+0x1c6/0x230
> Jan 28 19:32:25 t1 kernel:  [<c0592360>] unknown_bootoption+0x0/0x1e0
> Jan 28 19:32:25 t1 kernel:  =======================
> 
> -- I assume this is just the same problem exhibiting on a
> kernel with soft lockups detection enabled?
> 
> Hopefully I should be able to actually log into one of
> these machines over an alternate connection next time the
> problem recurs, at which point I should be able to get
> ethtool -d output. Anything else I should do at that
> point?
> 
> Any suggestions for what to do next to chase this problem
> down? I haven't yet tried the sk98lin driver on this
> hardware; is that still worth doing? Are there any useful
> tests we should try? Unfortunately, though these crashes
> happen pretty frequently (several times per day
> typically), I don't have a test case to reproduce one;
> however, if it'd be useful, I can probably get a pcap
> trace of the period immediately before the interface falls
> over using port mirroring on the switch to which the
> machines are connected. Is that likely to be informative?
> 


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-30  0:01 ` Stephen Hemminger
@ 2007-01-30  8:39   ` Chris Lightfoot
  2007-01-30  9:40     ` Tino Keitel
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Lightfoot @ 2007-01-30  8:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

On Mon, Jan 29, 2007 at 04:01:17PM -0800, Stephen Hemminger wrote:
> On Mon, 29 Jan 2007 23:57:32 +0000
> Chris Lightfoot <chris@ex-parrot.com> wrote:
> 
> >   [ please cc: me on any reply ]
> > 
> > I'm seeing lots of problems with the sky2 driver on Mac
> > Minis. Based on the suggestions in,
> >     http://www.mail-archive.com/netdev@vger.kernel.org/msg28221.html
> > I am running stock 2.6.19 + the patches from the
> > mactel-linux.org site to get the kernel booting on the
> > Apple hardware; none of these touches the sky2 code. The
> > module is installed with disable_msi=1 and
> > idle_timeout=10; the chip version is,
> >     Yukon-EC (0xb6) rev 2
> > 
> > The crashes we're seeing at the moment show (with
> > debug=16) lots and lots of transmits being queued up and
> > never being completed, even with the timeout switched on.
> > For instance, (this is on a machine running NFS root and
> > vlans)
> 
> Is this NFS over UDP?

yes. but we see similar problems on machines which aren't
doing lots of UDP traffic.

-- 
          Man: How was your flight, sir?
Prince Philip: Have you ever been on an aeroplane?
          Man: Yes.
Prince Philip: Well, it was like that.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-30  8:39   ` Chris Lightfoot
@ 2007-01-30  9:40     ` Tino Keitel
  2007-01-30 23:21       ` Stephen Hemminger
  0 siblings, 1 reply; 9+ messages in thread
From: Tino Keitel @ 2007-01-30  9:40 UTC (permalink / raw)
  To: Chris Lightfoot; +Cc: Stephen Hemminger, netdev

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

On Tue, Jan 30, 2007 at 08:39:19 +0000, Chris Lightfoot wrote:
> On Mon, Jan 29, 2007 at 04:01:17PM -0800, Stephen Hemminger wrote:
> > On Mon, 29 Jan 2007 23:57:32 +0000
> > Chris Lightfoot <chris@ex-parrot.com> wrote:
> > 
> > >   [ please cc: me on any reply ]
> > > 
> > > I'm seeing lots of problems with the sky2 driver on Mac
> > > Minis. Based on the suggestions in,
> > >     http://www.mail-archive.com/netdev@vger.kernel.org/msg28221.html
> > > I am running stock 2.6.19 + the patches from the
> > > mactel-linux.org site to get the kernel booting on the
> > > Apple hardware; none of these touches the sky2 code. The
> > > module is installed with disable_msi=1 and
> > > idle_timeout=10; the chip version is,
> > >     Yukon-EC (0xb6) rev 2
> > > 
> > > The crashes we're seeing at the moment show (with
> > > debug=16) lots and lots of transmits being queued up and
> > > never being completed, even with the timeout switched on.
> > > For instance, (this is on a machine running NFS root and
> > > vlans)
> > 
> > Is this NFS over UDP?
> 
> yes. but we see similar problems on machines which aren't
> doing lots of UDP traffic.

Hi,

I found my machine with a freezed desktop tomorrow morning. I don't use
anything UDP specific, just a lot of TCP traffic. I the machine is a
Mac mini Core Duo running 2.6.20-rc6 without special mactel-linux.org
patches except for the IR remote driver. The kernel log looks similar.
I don't use MSI. However, I don't get this several times a day. IIRC
this was the first time that I saw this.

Regards,
Tino

[-- Attachment #2: sky2_freeze.txt.bz2 --]
[-- Type: application/octet-stream, Size: 8002 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-29 23:57 sky2 problems on Intel Mac Mini Chris Lightfoot
  2007-01-30  0:01 ` Stephen Hemminger
@ 2007-01-30 19:15 ` Stephen Hemminger
  2007-01-31  0:09   ` Chris Lightfoot
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2007-01-30 19:15 UTC (permalink / raw)
  To: Chris Lightfoot; +Cc: Stephen Hemminger, netdev

There are a couple problems here:

1) the transmitter is getting hung.
2) the recovery logic doesn't work. If I can reproduce hang,
   then maybe the recovery code could be fixable.

Let's address the transmitter hang first.
The transmitter has multiple stages so it could be either:
a) hardware flow control problems
   look at ethtool -S eth0 statistics, are there flow control packets
   showing up?
b) GMAC or ram buffer issues
   looking at 'ethtool -d eth0' output can help, but it is a needle in
   haystack finding these setup errors.
 
   The sky2 driver copies most of the stuff from vendor version of sk98lin,
   but if sk98lin works and sky2 doesn't then comparing register settings
   can give hints.

c) DMA problems
   For some problems, I have had luck adding a /proc interface and dumping
   the transmit ring after a hang.  Looking at the last control block that
   hung can help.  This found the case where IPV6 TSO was leaking through.

d) checksum problems
   Turning off tx scatter/gather forces non fragmented skb's. This hurts
   performance, but can tell if the problem is with fragment code.
   Turning off tx checksum turns off scatter/gather, checksumming and
   TSO.

e) possible alignment and flow control interaction
   Because the receive DMA engine has hardware bugs and requires alignment
   or it doesn't work with flow control. I still wonder if there are alignment
   bugs on Tx with flow control.

f) other driver bug

To save time, I'll go get a new Mac Mini and try and clone this setup.
Could you send me a full kernel config (and other setup information
like filesystem type, distro etc).


> -- I assume this is just the same problem exhibiting on a
> kernel with soft lockups detection enabled?
> 
> Hopefully I should be able to actually log into one of
> these machines over an alternate connection next time the
> problem recurs, at which point I should be able to get
> ethtool -d output. Anything else I should do at that
> point?
> 
> Any suggestions for what to do next to chase this problem
> down? I haven't yet tried the sk98lin driver on this
> hardware; is that still worth doing? Are there any useful
> tests we should try? Unfortunately, though these crashes
> happen pretty frequently (several times per day
> typically), I don't have a test case to reproduce one;
> however, if it'd be useful, I can probably get a pcap
> trace of the period immediately before the interface falls
> over using port mirroring on the switch to which the
> machines are connected. Is that likely to be informative?
> 

The vendor driver does some slightly different setup, but it also
does a hardware reset when inactive (every 10ms).


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-30  9:40     ` Tino Keitel
@ 2007-01-30 23:21       ` Stephen Hemminger
  2007-01-30 23:24         ` Tino Keitel
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2007-01-30 23:21 UTC (permalink / raw)
  To: Tino Keitel; +Cc: Chris Lightfoot, netdev

On Tue, 30 Jan 2007 10:40:33 +0100
Tino Keitel <tino.keitel@tikei.de> wrote:

> On Tue, Jan 30, 2007 at 08:39:19 +0000, Chris Lightfoot wrote:
> > On Mon, Jan 29, 2007 at 04:01:17PM -0800, Stephen Hemminger wrote:
> > > On Mon, 29 Jan 2007 23:57:32 +0000
> > > Chris Lightfoot <chris@ex-parrot.com> wrote:
> > > 
> > > >   [ please cc: me on any reply ]
> > > > 
> > > > I'm seeing lots of problems with the sky2 driver on Mac
> > > > Minis. Based on the suggestions in,
> > > >     http://www.mail-archive.com/netdev@vger.kernel.org/msg28221.html
> > > > I am running stock 2.6.19 + the patches from the
> > > > mactel-linux.org site to get the kernel booting on the
> > > > Apple hardware; none of these touches the sky2 code. The
> > > > module is installed with disable_msi=1 and
> > > > idle_timeout=10; the chip version is,
> > > >     Yukon-EC (0xb6) rev 2
> > > > 
> > > > The crashes we're seeing at the moment show (with
> > > > debug=16) lots and lots of transmits being queued up and
> > > > never being completed, even with the timeout switched on.
> > > > For instance, (this is on a machine running NFS root and
> > > > vlans)
> > > 
> > > Is this NFS over UDP?
> > 
> > yes. but we see similar problems on machines which aren't
> > doing lots of UDP traffic.
> 
> Hi,
> 
> I found my machine with a freezed desktop tomorrow morning. I don't use
> anything UDP specific, just a lot of TCP traffic. I the machine is a
> Mac mini Core Duo running 2.6.20-rc6 without special mactel-linux.org
> patches except for the IR remote driver. The kernel log looks similar.
> I don't use MSI. However, I don't get this several times a day. IIRC
> this was the first time that I saw this.
> 
> Regards,
> Tino

Are you running 64 bit (x86-64) or 32 bit (i386)?

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-30 23:21       ` Stephen Hemminger
@ 2007-01-30 23:24         ` Tino Keitel
  0 siblings, 0 replies; 9+ messages in thread
From: Tino Keitel @ 2007-01-30 23:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Tino Keitel, Chris Lightfoot, netdev

On Tue, Jan 30, 2007 at 15:21:50 -0800, Stephen Hemminger wrote:
> On Tue, 30 Jan 2007 10:40:33 +0100
> Tino Keitel <tino.keitel@tikei.de> wrote:

[...]

> > Hi,
> > 
> > I found my machine with a freezed desktop tomorrow morning. I don't use
> > anything UDP specific, just a lot of TCP traffic. I the machine is a
> > Mac mini Core Duo running 2.6.20-rc6 without special mactel-linux.org
> > patches except for the IR remote driver. The kernel log looks similar.
> > I don't use MSI. However, I don't get this several times a day. IIRC
> > this was the first time that I saw this.
> > 
> > Regards,
> > Tino
> 
> Are you running 64 bit (x86-64) or 32 bit (i386)?

It's a Core Duo, so it is 32 bit.

Regards,
Tino

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
  2007-01-30 19:15 ` Stephen Hemminger
@ 2007-01-31  0:09   ` Chris Lightfoot
       [not found]     ` <iUr0b79BEZdD.1otwEM+70thgeGTyfwhw1g@sphinx.mythic-beasts.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Lightfoot @ 2007-01-31  0:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Stephen Hemminger, netdev

On Tue, Jan 30, 2007 at 11:15:20AM -0800, Stephen Hemminger wrote:
> a) hardware flow control problems
>    look at ethtool -S eth0 statistics, are there flow control packets
>    showing up?

on yeti (machine from which i quoted the first log
output),

[root@yeti /root]# /root/ethtool -S eth0 | grep mac_pause
     tx_mac_pause: 0
     rx_mac_pause: 8649

and on t1 both 0.

But presumably you want to know this at the point of the
failure -- I'll add it to the things the watchdog records
before rebooting.

> b) GMAC or ram buffer issues
>    looking at 'ethtool -d eth0' output can help, but it is a needle in
>    haystack finding these setup errors.
>  
>    The sky2 driver copies most of the stuff from vendor version of sk98lin,
>    but if sk98lin works and sky2 doesn't then comparing register settings
>    can give hints.

ok. I'll try to get one of these machines running the
vendor driver to see whether the problems still occur.

> c) DMA problems
>    For some problems, I have had luck adding a /proc interface and dumping
>    the transmit ring after a hang.  Looking at the last control block that
>    hung can help.  This found the case where IPV6 TSO was leaking through.
> 
> d) checksum problems
>    Turning off tx scatter/gather forces non fragmented skb's. This hurts
>    performance, but can tell if the problem is with fragment code.
>    Turning off tx checksum turns off scatter/gather, checksumming and
>    TSO.

also seems worth trying, though without a test case it'll
take a while to be sure what was causing the problem.

> e) possible alignment and flow control interaction
>    Because the receive DMA engine has hardware bugs and requires alignment
>    or it doesn't work with flow control. I still wonder if there are alignment
>    bugs on Tx with flow control.
> 
> f) other driver bug
> 
> To save time, I'll go get a new Mac Mini and try and clone this setup.
> Could you send me a full kernel config (and other setup information
> like filesystem type, distro etc).

we've seen this on lots of different machines; yeti is
NFS-root, originally ancient Redhat plus lots of
locally-built packages with some bits of the filesystem on
ext3. t1 is Ubuntu (`edgy' I think) on ext3. The same
problems occur on Debian `sarge' and CentOS, though.

What I haven't yet managed to do is to reproduce the
problem -- the test machine on my desk (also NFS-root)
has never exhibited it. But it's mostly idle.

    [...]
> The vendor driver does some slightly different setup, but it also
> does a hardware reset when inactive (every 10ms).

!!!

-- 
``I have a sneaking sympathy for Belgium, as a land where, by accident of
  geography, too often other people have chosen to hold their wars.''
  (Alan Follett)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sky2 problems on Intel Mac Mini
       [not found]     ` <iUr0b79BEZdD.1otwEM+70thgeGTyfwhw1g@sphinx.mythic-beasts.com>
@ 2007-01-31 16:48       ` Chris Lightfoot
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Lightfoot @ 2007-01-31 16:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Stephen Hemminger, netdev

On Wed, Jan 31, 2007 at 09:15:22AM +0000, Chris Lightfoot wrote:
> On Wed, Jan 31, 2007 at 12:09:37AM +0000, Chris Lightfoot wrote:
> > On Tue, Jan 30, 2007 at 11:15:20AM -0800, Stephen Hemminger wrote:
> > > a) hardware flow control problems
> > >    look at ethtool -S eth0 statistics, are there flow control packets
> > >    showing up?
> 
> and immediately after a transmitter lockup this morning,
>     tx_mac_pause: 1
>     rx_mac_pause: 10601
> (I've attached a full dump of the stats and registers in
> case it's of use).

but after a crash this afternoon,

     tx_mac_pause: 0
     rx_mac_pause: 916

-- 
``[You] couldn't get a ticket for love nor money....
  Although, actually, I can't be sure of this
  since I only really offered money.'' (Daniel Davies)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-01-31 16:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-29 23:57 sky2 problems on Intel Mac Mini Chris Lightfoot
2007-01-30  0:01 ` Stephen Hemminger
2007-01-30  8:39   ` Chris Lightfoot
2007-01-30  9:40     ` Tino Keitel
2007-01-30 23:21       ` Stephen Hemminger
2007-01-30 23:24         ` Tino Keitel
2007-01-30 19:15 ` Stephen Hemminger
2007-01-31  0:09   ` Chris Lightfoot
     [not found]     ` <iUr0b79BEZdD.1otwEM+70thgeGTyfwhw1g@sphinx.mythic-beasts.com>
2007-01-31 16:48       ` Chris Lightfoot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.