All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: David Dillow <dave@thedillows.org>
Cc: "Michael Riepe" <michael.riepe@googlemail.com>,
	"Michael Buesch" <mb@bu3sch.de>,
	"Francois Romieu" <romieu@fr.zoreil.com>,
	"Rui Santos" <rsantos@grupopie.com>,
	"Michael Büker" <m.bueker@berlin.de>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
Date: Fri, 21 Aug 2009 13:57:49 -0700	[thread overview]
Message-ID: <m1skfkrik2.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <1243042174.3580.23.camel@obelisk.thedillows.org> (David Dillow's message of "Fri\, 22 May 2009 21\:29\:34 -0400")

David Dillow <dave@thedillows.org> writes:

> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
>
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
>
> Signed-off-by: David Dillow <dave@thedillows.org>
> ---
> This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> configurations survive over an hour of testing when it would lockup in
> under 90 seconds before. I am certain of the analysis of the root cause,
> but there may be better ways to fix it. There may also be a theoretical
> race window between the ending of a NAPI poll cycle and a link change
> interrupt coming in, but I'm not sure it would matter. 
>
> Some variant of this should also be applied to the currently running
> stable trees, as the problem is long-standing.

I have what at first glance looks like a problem caused by this
patch.  For the last month since upgrading one of my machines from
2.6.28 to 2.6.30 it has been becomming inaccessible from the
network and I have a few:

NETDEV WATCHDOG: eth0 (r8169): transmit timed out

in my logs and a lot soft lockups that always have rtl8169_interrupt
as the thing that is running.   I suspect your patch has introduced
a near infinite loop in the interrupt handler and is causing these
soft lockups.

Any ideas?

Eric

BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
CPU 3:
Pid: 0, comm: swapper Tainted: G        W  2.6.30-170263.2006.Arora.fc11.x86_64 #1 G33M-S2
RIP: 0010:[<ffffffffa01deacd>]  [<ffffffffa01deacd>] rtl8169_interrupt+0x26f/0x2b7 [r8169]
RSP: 0018:ffff880028070cb0  EFLAGS: 00000206
RAX: 0000000000000050 RBX: ffff880028070d10 RCX: ffff88002807b9e0
RDX: ffffc2000065c03e RSI: ffff88012d79a000 RDI: 0000000000000246
RBP: ffffffff8100c9d3 R08: ffff88012fae0000 R09: ffff880028070ec0
R10: 077321422cb06619 R11: 000000003c5efb73 R12: ffff880028070c30
R13: ffff88012d79a000 R14: ffff88012d79a600 R15: 077321422cb06619
FS:  0000000000000000(0000) GS:ffff88002806d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc10010c000 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>  [<ffffffff81093f0b>] ? handle_IRQ_event+0x6a/0x13f
 [<ffffffff810219fa>] ? apic_write+0x24/0x3a
 [<ffffffff8109607a>] ? handle_edge_irq+0xdb/0x138
 [<ffffffff81012fbd>] ? native_sched_clock+0x2d/0x54
 [<ffffffff8100e996>] ? handle_irq+0x95/0xb7
 [<ffffffff8100df42>] ? do_IRQ+0x6a/0xe9
 [<ffffffff8100c853>] ? ret_from_intr+0x0/0x11
 [<ffffffff8104ba16>] ? __do_softirq+0x5e/0x1b0
 [<ffffffff8100cfcc>] ? call_softirq+0x1c/0x28
 [<ffffffff8100e721>] ? do_softirq+0x51/0xae
 [<ffffffff8104b6d2>] ? irq_exit+0x52/0xa3
 [<ffffffff81020f11>] ? smp_apic_timer_interrupt+0x94/0xb8
 [<ffffffff8100c9d3>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
 [<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
 [<ffffffff8100ae08>] ? enter_idle+0x33/0x49
 [<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
 [<ffffffff8136f30c>] ? start_secondary+0x19c/0x1b7



  parent reply	other threads:[~2009-08-21 20:57 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-04 17:28 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Michael Büker
2009-03-04 22:43 ` Francois Romieu
2009-03-06  0:17   ` Michael Büker
2009-03-08 10:27   ` Tom Weber
2009-03-10  5:42     ` Tom Weber
2009-03-09 12:07   ` Rui Santos
2009-03-13 18:29     ` Rui Santos
2009-03-16 13:07     ` Rui Santos
2009-03-22 21:12       ` Francois Romieu
2009-03-22 21:19         ` Michael Buesch
2009-03-22 22:00           ` Francois Romieu
2009-03-22 22:09             ` Michael Buesch
2009-03-22 22:27               ` Francois Romieu
2009-03-22 22:38                 ` Michael Buesch
2009-03-23 11:47         ` Michael Buesch
2009-03-23 12:47           ` Michael Buesch
2009-03-23 23:47             ` Francois Romieu
2009-03-24  9:43               ` Michael Buesch
2009-03-23 14:29         ` Michael Büker
2009-03-23 14:57           ` Rui Santos
2009-03-23 15:04             ` Michael Büker
2009-03-25 11:40         ` Rui Santos
2009-04-04 17:50           ` Michael Buesch
2009-05-10 13:38             ` Michael Riepe
2009-05-10 15:01               ` Michael S. Zick
2009-05-10 15:10                 ` Michael S. Zick
2009-05-10 15:53               ` Michael Buesch
2009-05-10 16:27                 ` Michael Riepe
2009-05-10 17:09                   ` Michael S. Zick
2009-05-11  0:29               ` David Dillow
2009-05-11 20:48                 ` Michael Buesch
2009-05-11 21:10                   ` Michael Buesch
2009-05-11 21:29                     ` David Dillow
2009-05-11 21:59                       ` Michael Buesch
2009-05-12 20:29                       ` Michael Riepe
2009-05-14  2:38                         ` David Dillow
2009-05-14 18:37                           ` Michael Riepe
2009-05-14 19:14                             ` David Dillow
2009-05-14 19:42                               ` Michael Riepe
2009-05-23  1:29                                 ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
2009-05-23  9:24                                   ` Michael Buesch
2009-05-23 14:35                                     ` Michael Riepe
2009-05-23 14:44                                       ` Michael Buesch
2009-05-23 15:01                                         ` Michael Riepe
2009-05-23 16:40                                           ` Michael Buesch
2009-05-23 14:51                                       ` David Dillow
2009-05-23 16:12                                         ` Michael Riepe
2009-05-23 16:45                                           ` Michael Buesch
2009-05-23 16:46                                           ` David Dillow
2009-05-23 16:50                                             ` Michael Buesch
2009-05-23 16:53                                             ` Michael Riepe
2009-05-23 17:03                                               ` David Dillow
2009-05-24 21:15                                   ` Francois Romieu
2009-05-24 22:55                                     ` David Dillow
2009-05-26  5:55                                   ` David Miller
2009-05-26 18:22                                     ` Michael Buesch
2009-05-26 21:52                                       ` David Miller
2009-05-26 22:14                                         ` David Miller
2009-05-26 22:40                                           ` Michael Riepe
2009-05-26 22:43                                             ` David Miller
2009-05-26 23:10                                               ` David Miller
2009-05-27 16:19                                           ` Michael Buesch
2009-06-16 19:32                                           ` Rui Santos
2009-08-21 20:57                                   ` Eric W. Biederman [this message]
2009-08-21 21:22                                     ` Michael Riepe
2009-08-21 22:59                                     ` David Dillow
2009-08-21 23:34                                       ` David Dillow
2009-08-22  0:24                                         ` Eric W. Biederman
2009-08-22 11:48                                         ` Eric W. Biederman
2009-08-22 12:07                                           ` Eric W. Biederman
2009-08-22 20:43                                             ` David Dillow
2009-08-23 17:17                                               ` Jarek Poplawski
2009-08-23 17:43                                                 ` Michal Soltys
2009-08-23 17:54                                                   ` Jarek Poplawski
2009-08-24  2:37                                               ` Eric W. Biederman
2009-08-25  0:51                                               ` Eric W. Biederman
2009-08-25  2:59                                                 ` David Dillow
2009-08-25 20:22                                                   ` Eric W. Biederman
2009-08-25 20:40                                                     ` David Dillow
2009-08-25 21:24                                                       ` Eric W. Biederman
2009-08-25 21:46                                                         ` David Dillow
2009-08-25 22:19                                                         ` Francois Romieu
2009-08-26  3:47                                                           ` Eric W. Biederman
2009-08-26  7:58                                                           ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
2009-08-26 13:56                                                             ` David Dillow
2009-08-26 13:59                                                               ` David Dillow
2009-08-26 20:02                                                                 ` Eric W. Biederman
2009-08-26 21:30                                                                   ` Francois Romieu
2009-08-26 21:40                                                                     ` Eric W. Biederman
2009-08-27  5:24                                                                       ` Francois Romieu
2009-08-27  5:38                                                                         ` Eric W. Biederman
2009-08-27 23:20                                                                           ` Francois Romieu
2009-08-28  1:17                                                                             ` Eric W. Biederman
2009-08-28  1:29                                                                               ` David Dillow
2009-08-30 20:37                                                                                 ` Francois Romieu
2009-08-30 20:53                                                                                   ` Eric W. Biederman
2009-09-01  3:33                                                                                     ` David Dillow
2009-09-01  9:20                                                                                       ` Francois Romieu
2009-08-25 21:37                                                   ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
2009-08-25 21:54                                                     ` David Dillow
2009-08-25 23:11                                                       ` Francois Romieu
2009-05-12 11:10                   ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
2009-05-12 21:45                     ` Michael Riepe
2009-05-13  6:11                       ` Francois Romieu
2009-05-13  6:27                         ` Michael Riepe
2009-05-13 19:34                       ` Krzysztof Halasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1skfkrik2.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=dave@thedillows.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.bueker@berlin.de \
    --cc=mb@bu3sch.de \
    --cc=michael.riepe@googlemail.com \
    --cc=netdev@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    --cc=rsantos@grupopie.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.