From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755273AbZHYVqh (ORCPT ); Tue, 25 Aug 2009 17:46:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751697AbZHYVqh (ORCPT ); Tue, 25 Aug 2009 17:46:37 -0400 Received: from emroute3.ornl.gov ([160.91.4.110]:46914 "EHLO emroute3.ornl.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751441AbZHYVqg (ORCPT ); Tue, 25 Aug 2009 17:46:36 -0400 Date: Tue, 25 Aug 2009 17:46:35 -0400 From: David Dillow Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts In-reply-to: To: "Eric W. Biederman" Cc: Michael Riepe , Michael Buesch , Francois Romieu , Rui Santos , Michael =?ISO-8859-1?Q?B=FCker?= , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Message-id: <1251236795.9607.34.camel@lap75545.ornl.gov> MIME-version: 1.0 X-Mailer: Evolution 2.24.5 (2.24.5-2.fc10) Content-type: text/plain Content-transfer-encoding: 7bit References: <200903041828.49972.m.bueker@berlin.de> <1242001754.4093.12.camel@obelisk.thedillows.org> <200905112248.44868.mb@bu3sch.de> <200905112310.08534.mb@bu3sch.de> <1242077392.3716.15.camel@lap75545.ornl.gov> <4A09DC3E.2080807@googlemail.com> <1242268709.4979.7.camel@obelisk.thedillows.org> <4A0C6504.8000704@googlemail.com> <1242328457.32579.12.camel@lap75545.ornl.gov> <4A0C7443.1010000@googlemail.com> <1243042174.3580.23.camel@obelisk.thedillows.org> <1250895567.23419.1.camel@obelisk.thedillows.org> <1250897657.23419.5.camel@obelisk.thedillows.org> <1250973787.3582.14.camel@obelisk.thedillows.org> <1251169150.4023.11.camel@obelisk.thedillows.org> <1251232848.9607.15.camel@lap75545.ornl.gov> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2009-08-25 at 14:24 -0700, Eric W. Biederman wrote: > David Dillow writes: > > I'm curious how you managed to receive an packet between us clearing the > > all current sources and reading the current source list continuously for > > 60+ seconds -- the loop is basically > > > > status = get IRQ events from chip > > while (status) { > > /* process events, start NAPI if needed */ > > clear current events from chip > > status = get IRQ events from chip > > } > > > > That seems like a very small race window to consistently hit -- > > especially for long enough to trigger soft lockups. > > Interesting indeed. When I hit the guard we had popped out of NAPI > mode while we were in the loop. The only way to do that is if > poll and interrupt were running on different cpus. That is the normal case on an SMP machine, but again that race window should be fairly small as well -- from the __napi_schedule() to the acking of the interrupt source is only a few lines of code, most of which is in an error case that is skipped. Granted there may be a fair number of instructions there if debugging or tracing is on -- I've not checked -- but even then hitting that race consistently for 60+ seconds doesn't seem likely. Being out of NAPI in the guard may be a red herring -- it doesn't tell us how long you were out of NAPI when you hit it. If there's a stuck bit somewhere, then you could have been out of NAPI after the first cycle and we'd have no way to tell. You could add some variables to keep track of the status and mask values, and how long ago they changed to see. > I am a bit curious about TxDescUnavail. Perhaps we had a temporary > memory shortage and that is what was screaming? I don't think we do > anything at all with that state. TxDescUnavail is normal -- it means the chip finished sending everything we asked it to. > Perhaps the flaw here is simply not masking TxDescUnavail while we are > in NAPI mode? No, we never enable it on the chip, and it gets masked out when we decide if we want to go to NAPI mode -- it is not set in tp->napi_event: if (status & tp->intr_mask & tp->napi_event) {