From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754431AbZENTOa (ORCPT ); Thu, 14 May 2009 15:14:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752754AbZENTOT (ORCPT ); Thu, 14 May 2009 15:14:19 -0400 Received: from emroute2.ornl.gov ([160.91.86.17]:51663 "EHLO emroute2.ornl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752694AbZENTOS (ORCPT ); Thu, 14 May 2009 15:14:18 -0400 Date: Thu, 14 May 2009 15:14:17 -0400 From: David Dillow Subject: Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too In-reply-to: <4A0C6504.8000704@googlemail.com> To: Michael Riepe Cc: Michael Buesch , Francois Romieu , Rui Santos , Michael =?ISO-8859-1?Q?B=FCker?= , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Message-id: <1242328457.32579.12.camel@lap75545.ornl.gov> MIME-version: 1.0 X-Mailer: Evolution 2.24.5 (2.24.5-1.fc10) Content-type: text/plain Content-transfer-encoding: 7bit References: <200903041828.49972.m.bueker@berlin.de> <1242001754.4093.12.camel@obelisk.thedillows.org> <200905112248.44868.mb@bu3sch.de> <200905112310.08534.mb@bu3sch.de> <1242077392.3716.15.camel@lap75545.ornl.gov> <4A09DC3E.2080807@googlemail.com> <1242268709.4979.7.camel@obelisk.thedillows.org> <4A0C6504.8000704@googlemail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2009-05-14 at 20:37 +0200, Michael Riepe wrote: > > David Dillow wrote: > > On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote: > > The patched driver runs on 2.6.27 and survives my 5 minutes 'dd > > if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less > > than 90 seconds on 2.6.28+. > > Not on my system: > This happened less than half a minute after the transfer had started. > And it's going to happen earlier if I increase the load. With four > connections to two other hosts, the transmission usually pauses after > less than ten seconds. Sometimes it lasts for only two or three seconds. Bummer, but a good data point; thanks for testing. I added some code to print the irq status when it hangs, and it shows 0x0085, which is RxOK | TxOK | TxDescUnavail, which makes me think we've lost an MSI-edge interrupt somehow. You being able to reproduce it on 2.6.27 where I cannot leads me to think that the bisection down into the genirq tree just changed the timing and made it easier to hit after it was merged. So, I suppose a good review of the IRQ handling of r8169.c is in order, though my SATA disks (AHCI w/ MSI irqs) also seem to have similar issues with delays, though that is entirely unqualified and unmeasured. Dave