From mboxrd@z Thu Jan 1 00:00:00 1970 From: lsorense@csclub.uwaterloo.ca (Lennart Sorensen) Subject: Re: Re: Strange connection slowdown on pcnet32 Date: Fri, 16 Feb 2007 15:23:00 -0500 Message-ID: <20070216202300.GD7585@csclub.uwaterloo.ca> References: <32943920.1119801171642884331.JavaMail.root@vms226.mailsrvcs.net> <20070216172110.GC7582@csclub.uwaterloo.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org To: pcnet32@verizon.net Return-path: Received: from caffeine.uwaterloo.ca ([129.97.134.17]:51653 "EHLO caffeine.csclub.uwaterloo.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946127AbXBPUXB (ORCPT ); Fri, 16 Feb 2007 15:23:01 -0500 Content-Disposition: inline In-Reply-To: <20070216172110.GC7582@csclub.uwaterloo.ca> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Fri, Feb 16, 2007 at 12:21:10PM -0500, Lennart Sorensen wrote: > On Fri, Feb 16, 2007 at 10:21:24AM -0600, pcnet32@verizon.net wrote: > > Are there any messages in the log about timeouts, or anything else from the driver? When it gets in this state, can you communicate with another system, and does it have the same slow behavior? > > Nope no timeouts or messages. As far as the system looks, cpu and ram and > logs show nothing unusual. Just very slow reception on the ethernet port > going towards the server providing the data for the transfer. Messages do > get through eventually, but very very late (when a ping reply arives at > the port and takes 5 to 10 seconds to make it to the network stack, then > something isn't right, at least when there is no other traffic waiting). > > I did have NAPI in the driver even in 2.6.8 (I was adding that at the > time). I am now testing with 2.6.8 without NAPI (so no mask/unmask of > receive interrupts taking place), and so far it has run for over an hour > without failing, although that doens't prove it won't, just that it has > lasted longer. > > I think I will try compiling 2.6.18 again with NAPI disabled on the > pcnet32 and see what that does. There is a chance that something in the > NAPI implementation is breaking the chip's receive somehow although I > can't currently imagine what it could be or how. So I have determined that when the port gets "stuck/slow" it is hitting this problem: (in pcnet32_rx): while (quota > npackets && (short)le16_to_cpu(rxp->status) >= 0) { if (netif_msg_intr(lp)) printk(KERN_DEBUG "%s: pcnet32_rx npackets %d\n", dev->name, npackets); pcnet32_rx_entry(dev, lp, rxp, entry); npackets += 1; /* * The docs say that the buffer length isn't touched, but Andrew * Boyd of QNX reports that some revs of the 79C965 clear it. */ rxp->buf_length = le16_to_cpu(2 - PKT_BUF_SZ); wmb(); /* Make sure owner changes after others are visible */ rxp->status = le16_to_cpu(0x8000); entry = (++lp->cur_rx) & lp->rx_mod_mask; rxp = &lp->rx_ring[entry]; } Unfortunately rxp->status reads as 0x8000 for a long time, and then eventually changes to 0x0310 at which point the receive happens. Until that happens, the poll is called about once per second and each time returns that 0 packets were received but that more packets are waiting. I can't figure out why it would get a status of 0x8000 which means that the MAC hasn't changed the ownership flag on the packet yet, even though it generated a receive interrupt multiple seconds ago. Could it be some caching issue that makes the cpu not realize that the memory has in fact been changed by DMA? Any way to force a cache update for a memory location? The CPU is a Geode SC1200 (Geode GX1 + Companion in one). So far I have seen __memcpy from system ram to device memory get data out of order, so I have no reason to believe the cpu doesn't have more stupid bugs related to doing I/O. -- Len Sorensen